Support the fact-based journalism you rely on with a donation to Marketplace today. Give Now!
More website links are expiring. Is it a bug or a feature of the internet?
Jul 11, 2024

More website links are expiring. Is it a bug or a feature of the internet?

HTML EMBED:
COPY
Clare Stanton at Harvard's Library Innovation Lab explains why link rot isn't surprising and the challenges in archiving the web.

The internet is full of all manner of unsavoriness that is surely corroding our minds and societies.

But the kind of rot we’re talking about here is link rot — the disappearance of online content when links turn into “404 Page Not Found.”

A recent study from Pew Research suggests almost 40% of all webpages that existed in 2013 are no longer accessible. That includes important government links, citations on Wikipedia and hyperlinks in news articles.

Marketplace’s Lily Jamali recently talked about this with Clare Stanton, product and research manager at Harvard Law School’s Library Innovation Lab, who also works on a webpage preservation project, perma.cc. The following is an edited transcript of their conversation.

Lily Jamali: What are the major reasons for link rot?

Clare Stanton: Well, there’s a lot of reasons for it. People stop paying for the domains, or people kind of intentionally take content down because they’re trying to hide it. But we at perma.cc, which is a project that deals with helping people trying to prevent link rot, we talk a lot about how that’s really just the design of the web. At the end of the day, it is intentionally decentralized, where anyone can post anything to the web and anything can take it down. And also it’s kind of meant to be this space that has the most up-to-date information. We think about digital objects as having the ability to be edited, which is good. You know, you want the most up-to-date information to be what’s displaying on your website. But of course, when things change and you take that down, someone out there might still need that old information, even if it’s not the most up-to-date one.

Jamali: It’s interesting to hear you say this was kind of always part of, maybe not the intention, but the design, the decentralization of the internet leads to this.

Stanton: Yeah, exactly. When the web first began, that really was a feature of it. And now I think we’re kind of starting to really understand deeply how the downsides of that affect our historical record and our cultural heritage, which now, a lot of times, lives on the internet.

Jamali: So you have been deeply involved in helping archive and preserve versions of these different sites when they no longer function. How difficult is that work? Can you talk about that?

Stanton: Well, just as much as the content of the web is changing all the time, so is the infrastructure and the technology that keeps it running. And so there’s kind of a cat-and-mouse game a lot of times when you’re writing and maintaining software to archive it. You’re also writing and maintaining software to archive the most recent version of browsers, of web technologies, and the biggest problem is that that we can’t move fast enough to archive what we need.

Jamali: Can you give me an example of something that you’ve worked on in the recent past that kind of tells the story about how the infrastructure of the web is making it hard to keep up?

Stanton: Well, sure, I think that for a lot of good reasons, a lot of websites have started to really think about how to manage bot traffic to their website. There’s a lot of just nonhuman players in browsers and on the internet these days, and websites like social media websites that don’t want robots posting to their platforms also flag a lot of technology that’s able to archive those pages as robots and blocks them. And so there’s, again, I use the phrase “cat and mouse,” where we’re kind of always trying to position ourselves to actually access the websites that we want to be archiving when we are technically running bots to go do it.

Jamali: Are there particular kinds of websites that are more prone to becoming unavailable?

Stanton: So there’s been research done around this question. It’s very difficult to really pinpoint with broad strokes, but we’ve done some research about this, and have found that government websites and also websites that are associated with academic institutions, those websites tend to have a higher link-rot rate. Those kind of, again, by design, change hands consistently, even though the website handle remains the same. So you know, harvard.edu or whitehouse.gov will always be there and will always bring you to those institutions as they exist. But of course, the whitehouse.gov website gets handed from administration to administration, as does the Harvard website. Everything is inevitably turning over very consistently, and that, again, is kind of built into the design that there might be higher levels of link rot.

Jamali: So would you say overall, the problem of link rot is getting worse?

Stanton: I think that it is a fact that as time moves on, things that have been on the internet longer are much more likely to disappear. I don’t think we have information about if it’s accelerating in any real way, but the web is changing a lot. I mean, the internet of the early 2000s is very different from the way that people interact with the digital world now. I mean, there’s a lot of folks who don’t necessarily use a browser to look at the internet anymore, and our digital lives are in proprietary apps instead. You’re on your phone in a way that is not the same as surfing the web. And that’s a whole other corner of this digital infrastructure that holds our information, it holds our heritage, it holds who we are as a society, and that’s a whole other nut to crack.

Jamali: What sorts of problems arise, especially ones with important information or context about specific times in our history, what sorts of problems arise when those sites expire?

Stanton: There are instances where you experience link rot, and it’s kind of, you know, it can be frustrating, but you kind of just move on with your day. But the real impact of all of this is to the historical record, and it’s to our understanding of our culture, of the “now,” when we’re in the future. So one of my colleagues at the Harvard Law School Library, professor Jonathan Zittrain, wrote about how, and I think the quote was, “society can’t understand itself if it can’t be honest with itself,” and you can’t be honest with yourself if you only live in the present moment. And the real implication of the web changing so much is that in the future, we won’t be able to look back and see it over time. We’ll only be able to see what is existing in that moment. And I think that we all would agree that not being able to look at a historical record is a downside for society and the whole.

More on this

In that Pew Research study, researchers also found around 54% of Wikipedia pages have at least one dead link in their references section. They also found 21% of government webpages and 23% of news sites contain at least one broken link.

But link decay can come for anyone. Last month, MTV News’ website — and the nearly two decades of articles and content hosted there — disappeared when its parent company, Paramount Global, shut it down. MTV News closed last year due to financial problems, but the shuttering of the website came pretty suddenly.

In the days after, however, the Internet Archive stepped in to do what it does and created a searchable collection on the Wayback Machine that preserves this piece of music and pop culture history.

The future of this podcast starts with you.

Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.

As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.

Support “Marketplace Tech” in any amount today and become a partner in our mission.

The team

Daisy Palacios Senior Producer
Daniel Shin Producer
Jesús Alvarado Associate Producer
Rosie Hughes Assistant Producer