A Link Rot Bestiary/Chapter 1 : Introduction

"Something’s Rotten with the State of Our Archives"
Michael J. Oghia, November 2024

Link rot is a URL that no longer works. It is conceptually simple, but link rot can be complicated. In an ideal world a dead link returns a "404" status code, or no response from server. There is no question: the link is dead. In practice link rot can take many forms: crunchy 404s, soft 404s, soft redirects, inferred soft redirects, bot blockers, etc.. identifying, naming and classifying the various scenarios, or concepts of link rot, is a first step towards building solutions.

Per the above essay by Michael J. Oghia, link rot is a big problem for global society. It may seem arcane, and it is, but the loss of digital information matters in the same way the permanent loss of books and manuscripts matter. And for many reasons link rot is hard to solve.

Machine solutions that are fully automated have significant limitations. They are easily fooled by bot blockers, paywalls, soft-404s and many other traps. Semi-automated solutions such as Wayback Medic can achieve high rates of accuracy, with the cost of manual labor and lack of scalability. A combination of these is one approach being explored.