We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Internet Archive is in danger

2025/1/7

On Point | Podcast

AI Deep Dive AI Insights AI Chapters Transcript

People

Brewster Kahle

Elise Stefanik

James Grimmelman

Meghna Chakrabarty

Topics

Elise Stefanik: 发言人将1月6日事件的参与者称为"人质"，并表达了对联邦政府武器化以及其针对特朗普和保守派的担忧。她还展现出前后不一致的言论，在事件发生当天谴责暴力，但在之后却改变了说法。 Meghna Chakrabarty: 主持人关注互联网作为历史记录的易删除性，以及互联网档案在保存历史记录中的重要性。她以Elise Stefanik的声明被删除为例，强调了互联网档案的价值。 Brewster Kahle: 互联网档案创始人强调了互联网档案作为全球网络以及其他各种内容（如旧电视节目、旧书籍）的唯一真正的公共记录的重要性。他指出互联网档案是一个免费服务，每天被数百万人使用，其重要性在于保存历史信息，防止历史被篡改。他还谈到了互联网档案的资金来源以及其面临的版权诉讼的威胁。 James Grimmelman: 法律教授分析了Hachette诉互联网档案案和环球音乐集团诉互联网档案案，解释了法院的裁决以及其对互联网档案的潜在威胁。他指出，法院认为互联网档案的数字化借阅计划与传统图书馆借阅计划存在技术差异，因此不受到版权法的保护。他还分析了出版商和音乐产业对互联网档案的诉讼策略以及其对数字版权和公共利益的影响。 Brewster Kahle: 互联网档案创始人详细解释了互联网档案的工作机制，包括其如何收集、存储和索引网络页面，以及其与其他图书馆的合作关系。他还谈到了互联网档案的资金来源以及其面临的版权诉讼的威胁。他强调了互联网档案的重要性，指出其是保存数字历史记录的关键，并表达了对互联网档案未来的担忧。 James Grimmelman: 法律教授分析了Hachette诉互联网档案案和环球音乐集团诉互联网档案案，解释了法院的裁决以及其对互联网档案的潜在威胁。他指出，法院认为互联网档案的数字化借阅计划与传统图书馆借阅计划存在技术差异，因此不受到版权法的保护。他还分析了出版商和音乐产业对互联网档案的诉讼策略以及其对数字版权和公共利益的影响。

Deep Dive

Key Insights

What is the Internet Archive and why is it significant?

The Internet Archive is a nonprofit organization that preserves digital content, including over 900 billion webpages, old television, books, and music. It operates the Wayback Machine, which allows users to access historical versions of websites. It is significant because it serves as a public record of the internet and other digital media, ensuring that historical information remains accessible even if original sources are deleted or altered.

Why is the Internet Archive facing legal challenges?

The Internet Archive is facing lawsuits from publishers and the music industry over copyright issues. In the Hachette case, publishers argued that the Archive's digitization and lending of books violated copyright law. Similarly, Universal Music Group is suing over the Archive's Great 78 Project, which digitizes old 78 RPM records. These cases threaten the Archive's ability to preserve and share digital content.

What is the Great 78 Project and why is it controversial?

The Great 78 Project is an initiative by the Internet Archive to digitize and preserve early 78 RPM records, which are fragile and often unplayable on modern equipment. It is controversial because Universal Music Group claims that the project violates copyright law, despite the Archive's argument that it serves researchers and preserves cultural history.

What are the potential consequences if the Internet Archive shuts down?

If the Internet Archive shuts down, we risk losing access to a vast repository of digital history, including websites, books, music, and television. This could lead to a fragmented and less reliable record of our digital past, with increased reliance on unregulated or illegal archives that may lack the Archive's standards of curation and preservation.

How does the Internet Archive preserve digital content?

The Internet Archive preserves digital content by crawling over 1 billion URLs daily, storing them on hard drives, and indexing them for access through the Wayback Machine. It also digitizes physical media like books and records, making them available for research and public use. The Archive collaborates with over 1,300 libraries worldwide to ensure broad preservation efforts.

What is the Hachette v. Internet Archive case about?

The Hachette v. Internet Archive case involves the Archive's practice of digitizing physical books and lending them digitally under a 'controlled digital lending' model. Publishers argued this violated copyright law, and the court ruled against the Archive, stating that creating and distributing digital copies without publisher authorization infringes on copyright.

What is the significance of the Wayback Machine?

The Wayback Machine is a tool provided by the Internet Archive that allows users to access historical versions of websites. It is significant because it preserves the internet's history, enabling users to see how websites have evolved over time and recover content that has been deleted or altered. It serves as a critical resource for accountability and historical research.

How does the Internet Archive fund its operations?

The Internet Archive is funded through a combination of library payments for digitization services, major donors and foundations, and contributions from end users. It operates on an annual budget of approximately $20-25 million, relying on public support to maintain its free services.

What is the role of libraries in the digital age according to Brewster Kahle?

Brewster Kahle emphasizes that libraries play a crucial role in preserving and providing access to digital content. He argues that libraries, as nonprofit entities, ensure that historical and cultural materials remain available to the public, even as digital formats evolve. Libraries also serve as a counterbalance to corporate control over information.

What is the legal basis for the publishers' case against the Internet Archive?

The publishers' case against the Internet Archive is based on the argument that digitizing and lending books without authorization violates copyright law. The court ruled that the Archive's actions constituted making additional copies of works, which is not protected under the 'first sale' doctrine that applies to physical books.

Chapters

The Internet Archive, specifically the Wayback Machine, is crucial for preserving online history. Copyright lawsuits threaten its existence, raising concerns about the loss of our digital records and the potential for historical manipulation.

Wayback Machine preserves 900 billion web pages
Copyright lawsuits could lead to its shutdown
Loss of online history and potential for historical revisionism are key concerns

Shownotes Transcript

Translations:

中文

This episode is brought to you by Shopify. Forget the frustration of picking commerce platforms when you switch your business to Shopify, the global commerce platform that supercharges your selling wherever you sell. With Shopify, you'll harness the same intuitive features, trusted apps, and powerful analytics used by the world's leading brands. Sign up today for your $1 per month trial period at shopify.com slash tech, all lowercase. That's shopify.com slash tech.

This is On Point. I'm Meghna Chakrabarty. Republican Congresswoman Elise Stefanik represents New York's 21st Congressional District. She is one of President-elect Donald Trump's most loyal supporters. And that loyalty has been rewarded. Trump has picked Stefanik to be the next U.S. ambassador to the United Nations when his new administration takes office in 13 days.

Now, yesterday happens to be the fourth anniversary of the January 6th, 2021 riots and attacks on the United States Congress. The certification of Trump's win went smoothly yesterday, his 2024 win. Unlike 2021, when Trump's supporters violently attacked police officers, defecated in the halls of Congress and forced the halt of the peaceful certification of a free and fair election.

On that day, January 6, 2021, Trump, in his first presidency, sat in the Oval Office and watched the entire attack unfold on television. He did not lift a finger to protect the people on Capitol Hill, the nation's representatives, or this system of government, which represents, of course, this nation itself. I'm going over this history because, after a time, people start forgetting.

The forgetting is already beginning, both as a product of the natural passage of time. I mean, we just forget stuff that's happened a long time ago, but also as a product of the purposeful deletion and rewriting of history. For example, since 2021, hundreds of people have been prosecuted and found guilty by juries of their peers for their various actions on the day they attacked the Capitol.

Donald Trump says one of the first things he'll do after January 20th this year is pardon those people. So this brings me back to Representative Stefanik.

Last year, she was on NBC's Meet the Press. She called the January 6th rioters hostages. I have concerns about the treatment of January 6th hostages. I have concerns. We have a role in Congress of Oversight. And I believe that we're seeing the weaponization of the federal government against not just President Trump, but we're seeing it against conservatives. We're seeing it against... Her answer goes on to talk about Hunter Biden and Hillary Clinton.

Now, this is quite different from what she said on the day of the attack four years ago. On January 6th, 2021, when Congress was able to reconvene in the small hours of the night,

Stefanik took to the House floor and made this statement. Americans will always have the freedom of speech and the constitutional right to protest. But violence in any form is absolutely unacceptable. It is anti-American and must be prosecuted to the fullest extent of the law. Again, that's Representative Stefanik, January 6th, 2021.

Now, we got that clip from C-SPAN and, bless C-SPAN, evidence of what happens in the floor of the House and the floor of the Senate remains there for as long as C-SPAN's archive exists. Now, Stefanik also published a written statement on January 6th, 2021, and posted it on her congressional website saying,

That statement reads in part, quote, End quote. That was on our website.

January 6th, 2021. But you know what? I'm going to pull up my computer here. Here it is right here. If you look for that statement today, and if you happen to be near a computer and you want to do this, I'm telling you, you will not find that statement. Here is the URL for where that statement once was. Okay. And you can try it. I'm going to do it right here. HTTP

Okay, secure, right? And then you go Stefanik, which is S-T-E-F-A-N-I-K, S-T-E-F-A-N-I-K.house.gov slash 2021 slash one slash Stefanik again.

dash statement dash violence dash united dash states dash capital. Love those SEO URLs. Okay, so then hit go, enter. I wonder if you got what I got. I got a website that says error.

The page you have requested does not exist or is undergoing routine maintenance. It still says Elise Stefanik serving New York's 21st District in the upper left-hand corner. And in other words, that statement on Congresswoman Stefanik's website has been taken down.

Now, the problem here is that, of course, in the 21st century, we use the Internet as our history book, notebooks, bookmarks, primary source, our entire storage and filing system to document the story of ourselves and our nation. So what happens when, at the tap of a key, that story can be so easily erased and then plausibly denied?

What do we lose with the record of our lives and our action when we lose those very records? Well, there is one place where that record remains preserved. It's called the Wayback Machine, and it's run by the nonprofit Internet Archive. That is how I found Stefanik's 2021 statement. So do this with me again.

Just go to web.archive. Actually, let me go back here and copy the original one a little bit. There we go. Okay, then you go to web.archive.org. All right.

And then when that loads up, there's a URL you can enter for the old page you're looking for. I'm going to just paste it in there. And then you click the places where the archive has scraped it and preserved it. And you click there. And there it is, the original statement as it appeared on the website, on Stefanik's website, the day it was first posted, January 6th, 2021. You see it right there. So what happens?

If the Internet Archive, the Wayback Machine itself, ceased to exist, do we take one more step towards the world of Winston Smith, the hero of George Orwell's classic 1984? Quote, Every record has been destroyed or falsified. Every book rewritten. Every picture has been repainted. Every statue and street building has been renamed. Every date has been altered.

and the process is continuing day by day and minute by minute. History has stopped. Nothing exists except an endless present in which the party is always right." While due to some very important court cases active right now, in reality, a world without the Internet Archive is not impossible to imagine.

So joining me now is Brewster Kahle. He's the founder of the Internet Archive and a digital librarian and computer engineer. Brewster Kahle, welcome to On Point. Oh, great to be here. Thank you, Meghna. What do you imagine the world might be like if the Internet Archive or the Wayback Machine ceased to exist?

The Internet Archive is the only real public record of the broad, worldwide web, but also all sorts of other things like old television, old books, that are all a cooperative effort of thousands of libraries to build a record of our time and make it as publicly available as we can.

That's what the Internet Archive is. Archive.org is a free service. It's used by millions of people a day. It's about the 200th most popular website of all. The good news is people want old stuff. Yeah, they want old stuff because it's part of what makes us who we are.

And so, I mean, do you dare imagine, like, what would we do if we didn't have access to that old stuff that has only ever existed in this digital format? I mean, do you dare imagine what that would be like? It would mean that people wouldn't be able to be held as accountable for what it is they said. But I think more broadly, they just wouldn't be able to remember. I mean, just we get emails all the time from people just being so delighted that

that their old websites are still around. They're available. Or their parents' websites. Or their memories from their youth. Their old alma mater. I mean, the World Wide Web is kind of magic in making it so that everyone can be a publisher.

But Tim Berners-Lee's system of the World Wide Web was kind of too simple. It only comes from one place. And that one place can be changed or deleted at any time. The average life of a web page is 100 days before it's changed or deleted. Sometimes on purpose, like what you're saying, they want to change history. And sometimes just because it just fades off.

So we need a record. We need a vibrant library system. And that's what's at threat here. I definitely accept the argument that the first ever web page that I made back in my college days with the little like dancing gifs of Bart Simpson, that doesn't necessarily need preservation. So that URL is dead and gone. But what you're saying though more is that this is who we are.

And if who we are only has a lifespan of 100 days, I'll come back to this later, but it really brings to question of like our engagement with the past and our belief in what is true in the present. Now, we have only about a minute left in this first segment, Brewster. Can you remind me, when did you come up with the idea of the Internet Archive?

The Internet Archive started in 1996, in the early days of the web. It was to basically build the library we'd been dreaming of forever, and I'd been working on since 1980. But others had been working on for actually much longer. The idea of having a library system that worked better than what we had growing up has been a lifelong dream for myself and many, many others. The Library of Alexandria is a...

renewed mythological goal of trying to make the published works of humankind available, standing on the shoulders of giants. So for me, it was a pretty obvious step that we needed to do this. And so we created the worldwide, the worldwide web, the,

a little bit in a crufty way, but now we have the Wayback Machine to help fill in some of those problems. Well, Brewster, Kay, stand by for just a moment because when we come back, we're going to talk more about how the Internet Archive and the Wayback Machine works. And then, of course, we will dive deep into these court cases that bring into question the existence of this archive itself. This is On Point. On Point.

Support for On Point comes from Indeed. You just realized that your business needed to hire someone yesterday. How can you find amazing candidates fast? Easy, just use Indeed. There's no need to wait. You can speed up your hiring with Indeed.

and On Point listeners will get a $75 sponsored job credit to get your jobs more visibility at Indeed.com slash On Point. Just go to Indeed.com slash On Point right now and support the show by saying you heard about Indeed on this podcast. Indeed.com slash On Point. Terms and conditions apply. Hiring? Indeed is all you need.

Hey, I'm Ryan Reynolds. Recently, I asked Mint Mobile's legal team if big wireless companies are allowed to raise prices due to inflation. They said yes. And then when I asked if raising prices technically violates those onerous two-year contracts, they said, what the f*** are you talking about, you insane Hollywood a**hole?

So to recap, we're cutting the price of Mint Unlimited from $30 a month to just $15 a month. Give it a try at mintmobile.com slash switch. $45 upfront payment equivalent to $15 per month. New customers on first three-month plan only. Taxes and fees extra. Speeds lower above 40 gigabytes in details.

You're back with On Point. I'm Meghna Chakrabarty. And today we are talking about the Internet Archive and its Wayback Machine and the court cases that could potentially threaten the existence of the digital world's most important archive. And Brewster Kahle joins us. He's the founder and director of the Internet Archive. He's a digital librarian and a computer engineer as well. So, Brewster, how does it work? Like,

Like, how are you storing 900 billion web pages? How do you do it?

Oh, it's just miracles of current computers. So we own our own computers at the Internet Archive. It's not in some cloud someplace, which is somebody else's computers. Libraries take preservation very seriously. And there are about 1,300 libraries, including the National Archives and Library of Congress and 1,000 libraries that basically say, crawl these at this frequencies, da-da-da-da-da. We collect over 1 billion URLs every day.

One billion. And those go and are stored in their full original form on hard drives.

And then they're indexed to be the Wayback Machine. So if you go to the Wayback Machine at archive.org, you can just type in a URL and see past versions and see the web as it was. So if you click on, say, a 2001 political website, you'll go and click around that world entirely.

as it existed then by pulling it out of the archive. And you could see all of the changes that were made to every URL when they disappeared or whatever.

Yeah. It's used by millions of people a day. That's what I did for this press release that Congresswoman Stefanik released in 2021. And I can see I still have the Wayback Machine page open here. There it is, beginning, quote, this is truly a tragic day for America. I fully condemn the dangerous violence. And then there's a little sort of there's like a timeline right at the top of the page that shows all the times that that page was scraped, by the

way back machine and does it show even the changes to the page every time? Yes, you can pack on the upper right. You can you can click to see if they've changed a word or phrase, but often it's just pages just completely disappear.

And you can see, so the evolution, past editions have always been very important. It's the memory hole problem. It's the 1984 nightmare of being able to go back and change recorded history. And libraries as being third-party, nonprofit, public services libraries

have always played a role in making a record and making that publicly available as well. So where do you get the funding for this? Because it seems like a very large undertaking.

It is and it isn't. But yes, we get about one-third of our income from libraries paying us to collect web pages or digitize books and records for them, about one-third from major donors and foundations, and about one-third from end users. And the same kind of NPR kind of beg-a-thon at the end of the year of, please, please

You know, please, please, please. And we have over 150,000 people a year that go and say, I want to support access to history.

And so it's about a $20 million, $25 million a year organization. Wikipedia is about 10 times that. But both of these are less than the San Francisco Public Library. Oh, wow. So even just San Francisco, that's not Alameda. We're tiny by comparison. So it is possible with this digital technology that

To make copies of these materials, preserve them, and then even put them in other locations for long-term storage. So other copies of the servers, right? Or the storage units, essentially, that you have. Yes, absolutely. You know, you only do a beg-a-thon once a year, Brewster. You've got to catch up. We do it like five times a year. Come on!

But so it's not just websites, though. Right. Here we get into the nitty gritty. There's more. It's basically and correct me if I'm wrong, but it's it's everything that's digitally available through the Web.

We try to basically collect everything that's digitally available through the web. Now, we can't keep up with everything, say, in YouTube, for instance. But if it's linked to or linked from a tweet, then we try to get it. But also, we try to record television, worldwide television, in cooperation with libraries around the world and try to make that searchable.

so that C-SPAN that you were referring to is also recorded. And you can only get clips from us. It's a search, and then we can send you a thumb drive, and you can borrow that program. If you want to reuse it for your documentary, then you have to go and license it or something. But it's available as a library, as a record. And it's very important that it's not just from one place only.

Because those are too easily manipulated and they go out of business all the time. The Internet Archive is kind of the place that websites that are long since dead, GeoCities, old people's blogs, all these past, the SoundClouds, the BandCamps, the Internet Archive.

The Internet Music Archives that existed 20 years ago and are long since dead. Those hold fantastic works, creative works of people that they love to be able to get back. And their old hard drives and phones are long gone. So music, you mentioned television, books. Yes.

This is all stuff that intellectually also belongs to people. And as you said at the very beginning of the show, one of the goals of the Internet Archive is to not only preserve this material but make it accessible to everyone. Isn't there a conflict there, right? Because we hear all the time about artists having their stuff listened to or read but not receiving a single penny of credit.

remuneration from that? Ah, well, that's how, well, publishing has always worked. They basically, in the old days, back when we were growing up, publishers would make copies, sell them to libraries and individuals. The publishers then would pay

some of it back upstream. And if there are lots of publishers, then authors and musicians had, you know, multiple to pick from fewer now, but that's a different issue. And then, then these libraries would preserve them because they've paid for the pay for the works. It's, and the question is, is how do we move into this digital era? And what libraries do is they make things not as available as through a,

bookstore or record store or something like that. They're available to those, the researchers that want to have access to the old versions. It's kind of crufty versions, but they're very important to have a record of them. And not, we don't compete really with the, you know, I don't think that

The blog of this radio program is going and complaining that the Internet Archive has it buried someplace in the Internet Archive's collections. It's because people will go to WBUR's podcast to go and find it. That parallel path of libraries and publishing have existed for thousands of years. But, of course, the difference –

is that we are a nonprofit, that we put our work out there for the public good, and we want as many people to have access to it at zero cost. That's not necessarily how the publishing business works, as you know. So, Brewster, hang on here for just a second, because as promised, we need to talk about these court cases that have come up regarding what the Internet Archive does.

And in order to get sort of the legal view on that, I'm going to bring James Grimmelman into the conversation. He's a professor of digital and information law at Cornell Tech and Cornell Law School. And he studies how laws regulating software affect freedom, wealth and power. Professor Grimmelman, welcome to On Pointe.

Hi, it's great to be here. Okay, so first of all, there's two primary cases, or really two cases we need to talk about. The first one is Hachette versus the Internet Archive. Tell us about that case. So this is a case about the Internet Archive's use of book scans.

The Internet Archive, in collaboration with other libraries and like many organizations, has been digitizing books. They get a physical copy of it. They put it in a book scanner. They take photographs of each of the pages. They recognize what the word's on. And now they have a digital record of what used to be in a physical book.

So Brewster was talking before about preserving the web. Those are things that were accessible online at one point, but when you're talking about physical books, they've never been previously available digitally, and so this is an additional way of having archival copies of them that can be preserved.

So in addition to digitizing the books for preservation, the Internet Archive also made them available to people for reading in a metaphorically way, the same way that a library with physical books would. You log in with your account, you check out the book, and then it's available to you to read on your computer until the end of your borrowing period. You return the book and you could check out something else.

The idea is that people circulate a copy of this book in the same way that a library would circulate a physical book from its shelves. Wait, wait. Can I just jump in here? So just to be clear, that digital copy that you're just talking about is in reference to how it works at your local library. So the local libraries have done something similar with licenses from publishers. The publisher gives them a digital file for an e-book, and...

they will let one person or some number of people read it at a time and that they pay the publisher for the e-book that they lend out. It's a kind of imitation of the model with physical books where the number of copies that circulate at any one given time is limited.

But Brewster, is that what the Internet Archive is doing with its digital books? Because a court said, I'll read the court ruling here in a second, but a court found the Internet Archive in basically in violation of copyright law with its book scanning program.

Yes, the Internet Archive, working with other libraries, basically has a physical copy, keeps that aside, and then lends the digitized photographs, as James put it, of these books to one reader at a time. So it's limited.

What's also a little different is what happens from what I think James said libraries do is they actually don't even have copies of the digital books from those publishers. They just pass the readers on to the publisher's database products, to their webpage, if you will.

and pay the right to go and do that. So the libraries actually, in the e-book world, never get a copy. They pay and pay and pay, but they've never bought a copy. Digital ownership is key here. So mostly the Internet Archive's collections are old 20th century materials.

We link them into Wikipedia so that people can go and look at the Wikipedia links to go and see, is that support the statement that's there? They just get a snippet. And then if they want to see more, they have to borrow or buy the book. Okay. But the Internet Archive, to be clear, lost this case. So let me provide a little bit more background. I believe it was first filed in June of 2020 and in the Southern District of New York,

It wasn't just Hachette, the publisher. It was also HarperCollins, Penguin Random House, and Wiley. They were organized by the AAP or the publisher's representative. It involved 127 works from these publishers. And basically what was found by a lower court judge and then affirmed by the Second Circuit was that

I'll quote the ruling here. Is it fair use for a nonprofit organization to scan copyright-protected print books in their entirety and distribute those digital copies online in full for free, subject to a one-to-one own-to-loan ratio between its print copies and the digital copies it makes available at any given time, all without authorization from the copyright-holding publishers or authors? Well, the court applied the Copyright Act, and they said the answer is no.

No. So, Professor Grimmelman, explain this ruling to me. What, in the eyes of the Second Circuit, because the Internet Archive declined to appeal up to the Supreme Court, what is the violation here of the Copyright Act? The issue is that the Internet Archive's lending program is...

is it looks and works a lot like traditional library book lending, but technically there are a bunch of computer implementation details that are different, and the court thought that those details make it fundamentally unlike library lending and not protected.

So libraries have always relied on another copyright defense, first sale. Once you buy a copy of a book, it is yours to sell, give away, or lend out as you see fit. So libraries would always buy books, and then first sale would protect their right to lend them out to any of their patrons.

First sale protects your right to work with that particular copy. If you buy one copy at the bookstore, you can sell that one copy. If you buy 10, you can sell or lend out those 10. The issue is that when you go to the digital world, the Internet Archive isn't distributing a physical artifact like a book with paper and ink to its readers. It's giving them digital access.

And the way that computers work when you want to give digital access to a file, it involves making a copy of the bits on that file on a different computer. And so the publishers argued and successfully persuaded the court that this is making a separate copy from the original file.

The original file on either the book paper form or the file on the archive servers. And that that additional copy triggers copyright law and isn't protected by first sale. And that is different than what Brewster was mentioning earlier. Is it that libraries, when they have their digital copies that they lend out, they're getting those digital copies from the publishers themselves? Yes.

Yeah, the libraries are getting permission from the publishers, which they have to pay for. They license the right to get people to read those copies. But really, they're not even licensing anything to their customers. They're just paying the publishers for the publishers to give the libraries patrons access. They're basically like...

points of sale for publishers for read a book for a couple of weeks. Okay. Well, Brewster Kael, I have a statement here from the Association of American Publishers, which brought the suit, and they said that they're thrilled to see that the Second Circuit's interpretation, quote, leaves no room for arguments that, quote, that controlled digital lending is anything more than infringement, whether performed by commercial or non-commercial actors.

Now, we've got to take a quick break here. But when we come back, I want to get your response to that, Brewster. And then we'll talk about how the music industry is also applying legal pressure to the Internet Archive. That's all in just a moment. This is On Point. On Point.

You're back with On Point. I'm Meghna Chakrabarty. And before we get back to today's conversation about the Internet Archive, a quick heads up on something we're working on for a bit later. U.S. Surgeon General Vivek Murthy says that the mental health of parents in America is a serious public health concern. And last year,

He believed it was such a concern that he issued a Surgeon General's advisory on it because according to a recent study by the American Psychological Association, 48% of parents say most days their stress is completely overwhelming. So we want to know if you are one of those parents. What are the sources of your stresses? How does it impact your day-to-day life?

What kind of support do you need? Pick up your phone and get the On Point VoxPop app. Just look for On Point VoxPop if you don't already have it. And tell us your parental stress stories. Or you can also call us at 617-353-0683. Now today we are talking about what may be the world's most important archive of all of the digital information we put out there. It's a record of our digital histories and it's called the Internet Archive.

And it has been sued by the publishing industry and the music industry in two major cases.

And the possibility coming out of those cases is that could the Internet Archive shut down? And if so, what would that mean for our collective memory, which is digitized on the web? And I'm joined today by James Grimmelman. He's a professor of digital and information law at Cornell. And Brewster Kahle, the founder and director of the Internet Archive. So, Brewster, going back to that statement from the American...

the publishers, when they said, basically, I'm going to paraphrase here, there's no difference between what the Internet Archive is doing by scanning these books and slapping a book down on a photocopier and pressing copy. And in every book, it says you cannot do that. You cannot make copies of this book. So, I mean, do you have an argument against that? Oh, yeah, absolutely.

the publishers lend out their electronic books, whether it's the Harry Potters or the like, using the same technologies to protect it from having multiple readers read it, that we use for the digitized, our Dusty Musties, our mid-20th century books about World War II. Those sorts of books are protected with the same technology. So I'd say it's

I think the bigger picture here is that, yes, there was a New York court that sided with the publishers, but other courts side with the libraries.

For instance, in Europe, when almost this exact same case came up of going and lending digitized books or digital books from libraries, all of Europe affirmed it both at the local level in Holland and at the European level.

China has allowed digitizing and lending for 15 years. India, also concerned with educating their public and supporting libraries, has also been supportive of educational exemptions.

In the United States, 100 years ago, lead in libraries, the Carnegie Libraries. And you have to remember that the publishers in general sue libraries over and over again about things like lending and have forever. But the legislatures and the judiciary in the United States 100 years ago said it was important to have libraries and archives there.

And they supported libraries, and we made the Carnegie Library System. What will be this generation's, what countries are going to lead in libraries is really unknown. But it's, that's the big question. You know, philosophically, I agree with you. I am a giant proponent of making information easily accessible for the good of the general public.

But James Grimmelman, let me turn back to you here. It doesn't seem to me that that is what this case is about. I mean, as you said, it's about the existence of that one digital copy. I mean, were the publisher's interest as narrow as that or did they actually have some sort of other strategy in play? Are they fearful that the Internet Archive, instead of doing those like Brewster said, those dusty musties about World War II history from 70 years ago, are that they're going to move into digitizing Harry Potter?

Yeah, I think the publishers are concerned about what they see as a principle and a slippery slope. The Internet Archive is not going to put them out of business, but they're afraid that there will be lots of other libraries that have lower standards and take fewer safeguards and work with books that are front-list titles, and that if they don't sue anyone,

everyone who is crossing their radar that eventually they won't be able to enforce any restrictions because everyone will just download free copies from the Internet of everything. Well, I just want to make a note that we did reach out to representatives and the legal team from the American Association of Publishers and they did not respond, but we did have that statement that I read earlier. Okay, so the other case... By the way, that case is complete. I just want to remind everyone. The

The Internet Archive declined to appeal to the Supreme Court. So the Second Circuit's ruling stands. Brewster, you lost. I'm sorry to put it so bluntly, but that is what happened. So I'm going to ask you in a minute about the implications of that for the archive. But Professor Grimmelman, on the music side of things, there's another case, Universal Music Group. And they are taking aim at the Internet Archive's Great 78 Project. What is this case about?

So this case is about another Internet Archive's efforts to digitize and make available another source of old media, in this case early 78 records. So 78 RPMs, this is the first major generation of widespread commercial records, and it's an amazing history of the early sounds of recorded music.

And so the archive recorded lots of these old ones, went to great effort to make digital versions of them, and put those on their websites so people can experience them without the risk of working with finding and potentially destroying extremely old and fragile records. And to be clear, these are not in the public domain now? No.

Well, there's a mixture because some of these would be works that would now be public domain. Some of them were not part of the federal copyright system but were added to it by the recent Music Modernization Act. It's a very diverse and in some ways legally complicated set of works. Okay.

Well, so what's the issue here, though? Is it the same thing that the Internet Archive is making this digital copy of these records, which, by the way, for those of you who are young enough not to know, we're talking about vinyl here. And that the existence of that copy in and of itself is the problem, or is it the fact that now many more people have access to it?

In some ways, they're very similar cases. Both are about digitizing these old works and then making them available. There are some legal differences between the two due to the different status of music in the copyright system, and there are some technical differences, but it's not

fundamentally different in kind. It's an objection by copyright owners that other people are making digital archives and then giving the public access to those archives. Okay, so we reached out to Universal Music Group. They did not respond. But we also reached out to the Recording Industry Association of America, RIA, and their chief legal officer, Ken Dornan,

Dorosho sent us back this statement. And he actually talked about some of the legal differences here. He says, quote, Congress took decisive action to protect pre-1972 recordings in the Music Modernization Act. And then he says, the Internet Archive's, quote, unquote, mass scale copying, streaming and distribution of the thousands of pre-1972 recordings are blatant violations of those established rights. How do you read that, Professor Grimmelman?

I think he's saying that the Music Modernization Act singled out music as special for extra protection. I don't know if that's right. There are some ways in which music gets a little bit heightened protection in U.S. copyright law, and a lot of ways in which it gets less than other kinds of works. The MMA

reduced some of those disparities, but I wouldn't say that it elevated music and old recordings above everything else. Well, Brewster, the Recording Industry Association of America calls your Great 78 Project, quote, yet another mass infringement scheme that has no basis in law. What's your response to that? Um,

We're a library. So this project is a combination of 100 different libraries and collections over that have participated in building this collection. And it's available in the same kind of way to the same kinds of users that their collections were when they were in their basements. So one of the first collections that came in was from the Boston Public Library.

that this collection of 78 RPM records is actually before vinyl. This is the old shellac recordings, you know, where you have to wind it up with the horn, the dog. These are, they stopped being viable in 1950. And people don't even have the players for these. So to understand what America sounded like, you actually had to either go and find these things and then record

Some would destroy them by putting them on these old record players, winding them up and listening to them in a crufty way. And people just weren't doing it. So in general, the idea is you'd go and make this available to researchers, which are about the only ones that care about the old crufts.

crackly things. And we actually, since this project has been going on for 10, 15 years and demonstrated it, the music industry forums and conferences, they loved it.

So this is very different from – so there's why are they going and trying to put the Internet Archive out of business is a different sort of issue here than what they – it's not a money issue. Most of these things have only been listened to by researchers about 100 times. If you were to pay full Spotify rates –

All of the things they're complaining about would be, and Spotify rates, because people don't listen to crackly old books or records, it would be about $10. Yet they're suing for $600 million. Why? So, Professor Grimmelman, that $600 million, is that what constitutes the threat of putting the Internet Archive out of business? Yes.

Yes, it is. Copyright has something called statutory damages, where the court is authorized to award up to potentially $150,000, even without proof that the defendant made that much money or the plaintiff lost that much money. It's just meant to be a kind of deterrent. And when you multiply $150,000 by thousands of recordings, you get up into the high millions, hundreds of millions very quickly.

So let me ask you, Professor Grimmelman, the same question where I started with Brewster at the top of the hour. Imagine for a moment that given the precedent of the publisher's case, that if this Universal Music Group case goes against the Internet Archive as well, and they are forced to pay hundreds of millions of dollars, which they can't, and they have to shut down, hopefully there's other alternatives, but if they had to shut down, what would we lose in your mind?

The other alternatives are going to be what the publishers would call pirate sites. They're going to be people who make archives completely illegally. They're going to be people who do it without the standards of copyright.

archival archivism i don't remember the word without the professional standards of actually trying to curate and organize these large masses of material we're going to have a huge morass of stuff out there which will be polluted and overrun with advertising and malware and deep fakes and it'll all just be this huge mismatch uh

I don't think they're actually going to stem the tide of anything. If anything, we're going to have a greater disproportion of stuff that's ephemeral rather than the enduring historical classics. So it's not like they're going to stop piracy. They're just going to make our past more confusing, messier, and harder to access.

So Brewster, are you preparing for the possibility that you have to do something with these materials or that, like, do you have a plan for what you might do if the Internet Archive does not have a future?

Well, running a library, we always use the library and you'll always come away with a book if you talk to a librarian. There's a wonderful book called The Library, A Fragile History. And it basically says what happens to libraries. And it starts with the old Acadian libraries and Library of Alexandria, but there's all these libraries in between. And what happens to libraries is they're destroyed. And they tend to be actively destroyed by the powerful. Hmm.

And it used to be church and government, and now it's corporations and government. And so that's what happens to libraries. So design for it. And what libraries have generally done is they've tried to work with each other, but a lot is lost.

So if you take the history of the Chinese libraries over the millennia, because they have a long history, is the libraries would be built up and then there'd be a new dynasty in town that doesn't like the old stuff around. They don't want it available.

And then they destroy the libraries. People steal the books. They often punish those people if they're caught with the books with death in the case of old Chinese. And then when the new dynasty comes around, they want them back and they start to build back the libraries again. Will that happen in this country? Well, probably. When? Don't know. Right.

Other parts of the world go through different cycles, though, and going and having collections and materials that are archived in different places. As I said, the Europeans are taking a very positive view towards libraries in this time when the United States is not. Well, you know, my mind is suddenly drawn to another great classic, Fahrenheit 451, right? In that world, human beings memorized the books that were burned, because...

So essentially, the material was contained somewhere else. And we just have a few seconds left. James Grimman, I'll let you have the last word. I mean, there are other major organizations. I think of the Library of Congress. Is there a way for that institution, whose duty is to document the history of this country, do they have a role here to be a container for this information? Yeah.

They could or should. There was an effort to create a digital public library of America a few years ago. There is an absolute need for this to happen. The Internet Archive has stepped up to fill some of our archival needs, but there's much more than they can do. Well, James Grimmelman is a professor of digital and information law at Cornell Tech and Cornell Law School. Professor Grimmelman, thank you so much for joining us.

My pleasure. And Brewster Kahle, founder and director of the Internet Archive. Brewster, it has been a great pleasure to speak with you. Thank you so much. Thank you. And by the way, once again, as a reminder, we did reach out to the legal teams for both the American Association of Publishers and Universal Music Group. They did not agree to join us today. We had that statement from the AAP, and also we have this statement from the Recording Industry Association of America as well. This is On Point. On Point.

The Internet Archive is in danger 47:00 Share