We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Breaking the internet

Breaking the internet

2025/2/19
logo of podcast Today, Explained

Today, Explained

AI Deep Dive AI Chapters Transcript
People
A
Addie Robertson
M
Mark Graham
S
Sean Rameswaram
Topics
Sean Rameswaram: 我关注到特朗普政府正在清除政府网站,这正在加速数字内容的衰败,并且危及了我们对自身的记录。政府网页正在消失,这属于一个更大的在线问题,即数字内容的衰败或链接腐烂。我们的互联网正在消失,我们需要了解其重要性以及我们可以做些什么。 Addie Robertson: 特朗普签署的行政命令导致政府在种族平等、跨性别者等方面的信息被大量删减。特朗普政府正在清除政府网站上的信息,包括气候变化和种族平等方面的内容。在恢复数据时,会在网站上添加声明,表示其反对这些信息。虽然历届政府都会更改数据以适应其优先事项,但特朗普政府的数据清除规模空前,甚至影响到人们工作所需的信息。特朗普政府的数据清除规模前所未有,尤其是在记录和科学研究方面。非营利组织和记者正在努力保存这些信息,互联网档案和回溯机器也扮演着重要角色。互联网面临着链接腐烂或数字衰败的问题,导致网页消失或难以查找。一项哈佛研究发现,最高法院案例中一半以上的超链接已失效。网页消失的原因有很多,包括网站关闭、URL更改等。链接腐烂是一个长期存在的问题,但近年来由于一些特殊情况而加剧。搜索引擎优化导致一些网站为了提高搜索排名而删除旧文章,造成信息丢失。互联网商业化和赢利导向导致一些旧网站被复活,但内容被AI生成的新文章取代,或链接被劫持。丢失的网络信息有时包含统计数据或证据,也具有文化价值。 Mark Graham: 互联网档案的回溯机器一直在持续地保存网络信息,即使网站关闭也能找到之前的信息。互联网档案的回溯机器每天都会存档数百万个URL和网页,并通过链接不断扩展存档范围。互联网档案的回溯机器会对存档的资料进行索引,并提供给用户查询。虽然美国国家档案馆和记录管理局也进行存档工作,但互联网档案的回溯机器是存档公共网络(包括美国政府网站)的主要参与者之一。政府网站的变更在一定程度上是正常的,但目前政府网站的消失、恢复和不可靠性程度不同寻常。目前政府网站下线的情况与以往不同,数量似乎更多。互联网档案的回溯机器主要依靠捐款,能否充分存档互联网取决于主观定义,但其努力的目标是成为最好的图书馆。丢失的网页,即使是短暂存在的,也代表着对历史记录和信息完整性的缺失。

Deep Dive

Shownotes Transcript

Translations:
中文

President Donald Trump has been back in office for one month. And what a year it's been. We've covered a lot of Trump that today explained this past month, from pardons to executive orders to Greenland to Guantanamo to tariffs to Maha to Elon and Elon and even more Elon. But today we're going to talk about the websites. DEIs would have ruined our country and now it's dead.

I think the AI is dead, so if they want to scrub the website, that's okay with me. Government webpages are disappearing. Sometimes they come back, sometimes they don't, and it's part of a greater problem we have online. Some call it digital decay, others call it link rot. Whatever you call it, our internet is disappearing, and we're going to help you understand why it matters and what we can do about it on the show today.

Whatever you look for in a getaway, you can find it at Virginia Beach. When you're there, you'll be able to enjoy some of the best cultural attractions, activities, and culinary experiences the world has to offer. You could take a stroll on the world's longest pleasure beach that travels for miles and miles.

Or you could take part in their annual festivals, concerts, and waterfront dining. And if you're in the mood for dinner, make sure to check out their fresh local seafood with farm-to-table ingredients. It's a trip that everyone in the family will remember for a lifetime. Go to visitvirginiabeach.com to learn more.

Okay, business leaders, are you here to play or are you playing to win? If you're in it to win, meet your next MVP. NetSuite by Oracle. NetSuite is your full business management system in one convenient suite. With NetSuite, you're running your accounting, your finance, your HR, your e-commerce, and more, all from your online dashboard. Upgrade your playbook and make the switch to NetSuite, the number one cloud ERP. Get the CFO's Guide to AI and Machine Learning at netsuite.com slash vox.

netsuite.com slash vox.

Sean Ramos from here with Addie Robertson, senior editor at The Verge, here to tell us about the websites. What is going on with the government's websites? So Trump signed a couple of executive orders, one of which defined officially the idea that there are only two genders, male and female. And another one that ends, quote unquote, diversity, equity and inclusion in the government.

We will forge a society that is colorblind and merit-based. And so the result here has been that more or less across the government, in addition to the kind of thing that we saw in the first Trump administration, which included purging information about climate change,

and some other general climate-related issues, we've seen just a massive cut of anything that involves racial equity or transgender people or really anything that is sort of a subject of Republican culture wars.

The CDC is currently scrubbing information from their website right now to be in compliance with a recent executive order. Here are some of the pages that have gone down. - The Trump administration has taken away ReproductiveRights.gov from the federal website. They also have scrubbed federal websites for any search

A lot of the stuff initially happened very quietly.

Reporters noticed it. People who used the information on these sites, which included data on the CDC or even transportation statistics, they have ended up uncovering a lot of this. And from there, the way that the Trump administration has mostly addressed it is in response to lawsuits, that there were claims that they deleted this data improperly. There was a court order that required them to put it back up.

And they have responded by putting it back up with a big banner that says, "We reject this information. We were forced to keep it online." But it violates something like, say, our dictate that there are only two sexes. So we find it unscientific or we find it against our policies. Any information on this page promoting gender ideology is extremely inaccurate and disconnected from the immutable biological reality that there are two sexes: male and female.

Is there a presidential precedent for something like this happening? Or is Donald Trump and Doge and Elon Musk and the gang like the first administration to come in and just start ripping apart websites?

First of all, just for context, every time there is a new presidential administration, there is data that changes their priorities. There are new programs or old programs that get retired. So it's not necessarily surprising that some things have changed. But we have, as part of this, seen just a massive and really unprecedented change.

deletion of information, including information that is required for people to do their jobs outside of the White House. And so it's a really huge issue right now.

I don't think we've ever seen this kind of scale of data purging, especially of records and scientific research. Obviously, the first Trump administration deleted some data in ways that seemed very ideological, aimed at suppressing information about climate change. The White House and other federal agencies are also revamping their websites, for instance, scrubbing mentions of climate change.

And Trump is blasted. And obviously there have been pages that just disappeared at the end of terms, but that tended to be more about oversight. It tended to be more that there was a changing of the guard and they didn't really know where everything was. So some websites are disappearing. Some websites are disappearing and coming back. Some websites are still up. Is there anyone who has like a full grasp of what exactly is gone forever? Yeah.

There are nonprofit groups and journalists that are working to preserve this information. There were already groups before Trump took office, like the Environmental Data and Governance Initiative, that we saw a little of this in Trump's last term. And so there was this effort preemptively to preserve information, which includes not just web pages, but also just collections of data from groups like the CDC, which

So there are all of these, not necessarily fragmented, but individual and private efforts. And also one of the really big load-bearing institutions here is the Internet Archive and the Wayback Machine, which has always maintained this project that archived data at the end of every term, but now has become a place where you can go and check and see what's disappeared and has...

become part of this process of identifying and trying to recover data. Beyond the American people perhaps needing access to some of this information, beyond any number of institutions needing access to this information, it points at a bigger problem we have on our internet right now, right? Something called link rot? Link rot or digital decay.

Which is a general phenomenon where web pages either disappear or they move in a way that makes them more difficult to find. And so the Internet, which is a series of links that point to information, ends up with all of these little dangling ends and dead links and places where you can no longer find information that someone has referred to or when you can simply no longer find a record of it at all.

A 2013 Harvard study, for example, found that half the hyperlinks in Supreme Court cases, today's equivalent of footnotes, are broken, a phenomenon known as link rot.

Why do web pages disappear? The most obvious case is when a page is just taken down, maybe sometimes because the entire website went under, maybe sometimes because they think that page is no longer valuable. Government agencies remove documents and companies fail and with them the sites they host. Think of GeoCities, Yahoo Video, and more recently, the news site Gawker. There are also incidents where just the URL of it, the link that

points to that information changes and so it's harder to find. So if you previously linked to it from another web page then that's just not going to go there anymore. The wonder of it is it's very, very simple. Anybody could go and set up a web server on their computer and make it available to the world. Unfortunately, it's too simple. It's fragile. That if something happens to that piece of equipment, that website, just blink, is gone. So you've been covering this issue, Addy, for more than 10 years.

Is link rot getting worse online or is it sort of, you know, continuing apace? Link rot has been an issue that people have been identifying in some ways since really the beginning of the internet. But for definitely at least a decade, a really significant proportion of web pages and links have no longer functioned. I think the latest research was something like

38% of web pages that existed in 2013 are no longer available. This is, I think, not necessarily an issue that has suddenly snowballed, but I think we're seeing some unique circumstances now that have added to it.

One of them is something like search engine optimization, where Google rewards pages, or at least people think it rewards pages, that regularly refresh or that seem like they are providing new information. And so, for instance, CNET, which is a really venerable tech publication, removed a bunch of its older articles because it wanted to appear in Google search results.

more highly. And so there was this sense that, okay, it makes people more likely to find current articles, but also just this trove of information disappears. Right. I mean, I think we can all, you know, mourn the loss of like our GeoCities homepage from 2003. Yahoo! Yahoo!

But it's a lot rougher when, like, I don't know, some billionaire buys out an alternative newspaper and just decides one day to shut down its website. Sometimes it's a billionaire that buys something and shuts it down. There are also just more insidious phenomena that I think –

really kind of speak to the commercialization of the internet and the sort of cannibalization of anything that can be turned toward profit. So you have old websites that say have a name people recognize and then they get resurrected, but they no longer have the old information. They've been filled with AI generated new articles that can sort of capitalize on this old name as this zombie site. Or

Or you have issues where there's a link that goes dead and somebody tries to kind of hijack that link and they either they contact the website administrator or they find some other way to get that to point to a new page that will then build their own credibility but doesn't provide the original information. So there are all these cases where archival gives way to profit.

That information was useful sometimes because it provided, say, statistics or it provided evidence. If you're, say, looking at Wikipedia and there's a dead link that no longer provides the information it used to. And sometimes just because these things are a valuable record.

of what the internet used to be and of how people lived. There are a lot of things that at one point would have been written down on paper or in some other medium that's just a hard document and people can look back on it. But at this point, a huge amount of our culture takes place on the internet and the internet is a very fragile place.

Addie Robertson, reader at TheVerge.com. When Today Explained returns, we're heading into the Wayback Machine to hear from the people trying to archive the entire internet one webpage at a time. ♪

Support for this show comes from Robinhood. With Robinhood Gold, you can now enjoy the VIP treatment, receiving a 3% IRA match on retirement contributions. The privileges of the very privileged are no longer exclusive. With Robinhood Gold, your annual IRA contributions are boosted by 3%. Plus, you also get 4% APY on your cash and non-retirement accounts. That's over eight times the national savings average.

The perks of the high net worth are now available for any net worth. The new gold standard is here with Robinhood Gold. To receive your 3% boost on annual IRA contributions, sign up at robinhood.com slash gold. Investing involves risk, rates subject to change. 3% match requires Robinhood Gold at $5 per month for one year from first match. Must keep funds in IRA for five years.

Go to Robinhood.com slash boost. Over eight times the national average savings account interest rate claim is based on data from the FDIC as of November 18th, 2024. Robinhood Financial LLC, member SIPC. Gold membership is offered by Robinhood Gold LLC.

Support for today Explained comes from Hydro. Maybe you kicked off the week strong, hitting the gym on Monday with every intention of getting the rest of the week in. But then life happened, you know, your friends called you over, there was a game, there was a movie.

There was a rough day of news and you needed to come home and lie on the floor for a while. Anyway, the mental back and forth about working out turned out to be more exhausting than the workout itself. With the Hydro Rower, they say you can get a full body workout in just 20 minutes. No overthinking required. You can stick to the plan and get a full body workout overnight.

all from the comfort of your home with Hydro. Head over to hydro.com and use the code EXPLAINED to save up to $475 off your Hydro Pro Rower. That's H-Y-D-R-O-W.com. Code EXPLAINED to save up to $475. Hydro.com. Code EXPLAINED. Support for this show comes from Oracle.

Even if you think it's a bit overhyped, AI is suddenly everywhere. From self-driving cars to molecular medicine to business efficiency. If it's not in your industry yet, it's coming. Fast. But AI needs a lot of speed and computing power. So how do you compete without costs spiraling out of control? Time to upgrade to the next generation of the cloud. Oracle Cloud Infrastructure, or OCI.

OCI is a blazing fast and secure platform for your infrastructure, database, application development, plus all your AI and machine learning workloads. OCI costs 50% less for compute and 80% less for networking, so you're saving a pile of money. Thousands of businesses have already upgraded to OCI, including Vodafone, Thomson Reuters, and Suno AI.

This is Today Explained.

So let's just have you start by saying your name and what it is you do. Sure. Yeah. Hi. My name is Mark Graham, and I am the director of the Wayback Machine at the Internet Archive. Which is a not-for-profit that has been preserving the web since 1996. Journalists use it all the time. But for the uninitiated, I asked Mark to show us around the Internet Archive.

Where do I begin? It's like walking into a very large library and say, show me your favorite book. Well, for example, last year, it was a big news story that MTV News was shut down. And the founding editor of MTV News wrote about it on LinkedIn. And there was a lot of other editors talking about it. It's like, oh, my God, all of our articles are gone. They're missing. And I just casually, you know...

waded into the conversation and go, hi, check here, Wayback Machine. And they were like, oh my God, you guys like...

Got it all, pretty much, yeah. And they said, well, people say, well, what did you do? What did you do when it went down? I say, we didn't do anything when it went down because we've been doing our job all along. We've been working to archive the public web as it's published on an ongoing and continuous basis.

So if we have to start paying attention to something after it's gone down, that means we screwed up. So with that example, with MTV News, give us a sense of what you guys were doing in advance of that website going down to make sure that people could find out, you know, I don't know, what Everlast was singing about in 2004. Hello, Jancy Dunn here, and joining me now is former House of Pain member Everlast. Welcome, Everlast. Thank you.

So for any one of a number of thousands of reasons, we set our web crawlers and archiving software out on a mission every day to identify and to download web pages and related web-based resources. We bring in millions and millions of URLs every day that are signals to us, signals of where new material is being published on the web.

And we make sure that we archive all of those URLs, all the web pages associated with those URLs. And then we look at those pages and we identify links to other pages. And then we go to those pages and we archive them, etc., etc., etc. That's where you get this metaphor of crawling like a spider throughout this web. And the net result of it is that we add more than a billion archived URLs to the Wayback Machine every day.

And this material, as it's added to the Wayback Machine, is indexed and it's immediately available to people who go to web.archive.org, enter in a URL, and then are able to see a history of archives that we have of the web page that was available from the URL at any given time.

I want to talk about government websites now because that's sort of the reason we're having this conversation today. I think most people probably think the government will take care of archiving government websites. But here we are in a new administration and websites are disappearing, coming back online, and people are worried. When you, an archivist of the internet, see government websites online,

Disappearing, coming back online, becoming unreliable. How do you react to that? Is that like better or worse than regular websites that are non-governmental going offline? Well, as an American, my tax dollars help pay for some of this stuff. And then much of it is maybe a benefit to people. Certainly, my first reaction is, hmm, that might not be such a good thing. I do want to underscore that there is the National Archives and Records Administration program.

That does do archiving as well. But for whatever reason, we seem to be like one of the main players in the space of trying to archive much of the public web, including, and right now especially, U.S. government websites, and making those archives available in near real time. Were you caught off guard when you saw the new administration –

Removing web pages, removing websites? This is pretty normal in some respects. It's normal and expected, and it's what's happened, frankly, for each administration in the time that we've been working on this effort. I mean, look, it's under new management, right? For example, you wouldn't expect the WhiteHouse.gov website under any new presidential administration to be the same as it was before.

So we go out of our way to try to anticipate the frequency in which web pages should be archived so that we got a pretty good shot at getting those changes. You're saying, you know, the WhiteHouse.gov site obviously changes administration to administration. I think to some extent.

degree people understand that, that Joe Biden's administration probably wouldn't have been posting trolly valentines about immigration, you know, a year ago this time to their Instagram account. But what we're seeing here is...

is websites that people need, websites that record public health information going offline, briefly, permanently, what have you. No, that's true. Is that a different degree of sort of erasing the historical record or messing with the historical record than we've seen? I don't know.

It's different. It's certainly different in terms of the number, seemingly. I mean, we're still in the early stages of this administration. But yeah, I'd say on the face of it, you're right. Historically, we haven't seen major U.S. government websites taken offline like we did, say, for example, with regard to USAID. But...

And I'm going to leave that kind of analysis to others and really just focus on trying to archive the material.

The Wayback Machine, the Internet Archive, mostly funded through donations, the generosity of people, institutions, even governments. Is that going to be enough to archive the Internet to the extent that, you know, future generations will want to see and need? Enough is a very subjective term. Well, as an archivist...

For me, it's never enough because you don't know. No one knows what is going to be of use, value, importance in the future, maybe even the near future of tomorrow, much less like the very far off future tomorrow.

And since millions of people use our site on a daily basis, we get a lot of feedback from them. It motivates us, but it also helps direct us and inspires us to continuously try to do a better job at being the best library that we can be. Godspeed. There you have it. Let me ask you one last question, Mark.

You guys have been at this for nearly three decades. Certainly you've saved a lot of stuff, and certainly a lot of stuff has fallen through the cracks. I wonder, is there something that slipped through the cracks that you could tell us about that might suggest to our audience what is lost when we can't archive to the extent we want to or need to?

Okay, so it kind of caught me up with that question. I'll just say, I don't know right now. I can't say that thing. Gosh, I wish. Okay, I got one. I mean, this is just in recent history. Apparently, there was a page up on the CDC website about bird flu last week. It apparently was only up for a few minutes and no one got it. Huh.

And by losing that fleeting webpage, that one, you know, maybe minor, maybe major webpage about bird flu on the CDC's website, what are we losing? Well, we're losing part of the story, right? We're losing part of our understanding of the evolution of arguably a significant health crisis.

We don't know where this is going to go. I don't know. I guess that's the other point, right? I mean, you don't know necessarily now that which is going to be very important in the near or longer term. In the time of Martin Luther, there was a raging, raging debates. And much of that debate took the form of things that were written on pamphlets. The pamphlets at the time were considered books.

Yeah.

I mean, and you are comparing, in a way, a CDC website to the Protestant Reformation. But I think you mean it, don't you? I do, because I don't know. And one really can't know without the benefit of the long historical view. And that's not something that we have access to today. Why? Because we don't have a real time machine.

Oh, and...

And it's Today Explained's seventh birthday today. What did you get us? Maybe show some love in the comments, in the ratings, in the reviews. They say it helps. And thank you for listening for however long you've been listening. If you're new to the show, feel free to browse the archive.

Get that Angel Reef special at McDonald's now. Let's break it down. My favorite barbecue sauce, American cheese, crispy bacon, pickles, onions, and a sesame seed bun, of course. And don't forget the fries and a drink. Sound good? Ba-da-ba-ba-ba. And participating restaurants for a limited time.

Explore new frontiers at South by Southwest, the premier destination for creatives from around the globe. Network with industry professionals, attend inspirational conference sessions, check out the latest tech innovations, and explore so much more at the 2025 South by Southwest Conference and Festivals.

This March 7th to 15th in Austin, Texas. The greatest discoveries happen when bold people and ideas come together at South by Southwest. Save 10% off platinum badges with code Vox10 at SXSW.com.