Thursday, July 3, 2014

Rotten Links (Are Big Time-Sinks)

It's no secret that web links can be unreliable. The Chesapeake Digital Preservation Group, which has been reporting on website "link rot" since 2008, said in its 2013 annual report that nearly half of the links from its original website sample list no longer work; this includes a number of government and educational websites. A similar study of websites cited by the U.S. Supreme Court from 1996-2010 showed that nearly one-third of the cited links were no longer functional. As the A.B.A. Journal reported in December, groups including Chesapeake as well as Perma.cc (of which Duke Law is a member) are working to combat the problem going forward, but in many cases the damage has already been done.

So what can researchers do when they encounter a dead website URL? A blueprint can be found in chapter 6 of the latest edition of Levitt & Rosch's new reference work The Cybersleuth's Guide to the Internet: Conducting Effective Free Investigative & Legal Research on the Web. For pages which were changed or moved very recently, you may be able to access a cached version through your preferred search engine. Google, Bing and Yahoo all provide temporary "cached" copies of the last time their search engine's crawler visited a particular page. On Yahoo search results, a link for the "Cached" version of each page is displayed prominently; on Google and Bing, cached options must be accessed through a drop-down arrow next to the page's URL.

Cached versions of pages change frequently. To view versions of a web page which are older than available search engine caches, try the Wayback Machine, which provides archived versions of specific web pages, dating back to 1996 in some cases. Enter the website URL in the search box to view a timeline of available archived versions. For example, the Goodson Law Library home page has been archived back to February 1999 (back when it was known as the Duke University Law Library, or D.U.L.L.).

 Note that many websites request to be excluded from the Wayback Machine, and even archived versions of pages may not always display properly. (For an example, note the broken image files in this early 1999 snapshot of the Duke Law Library site. They load quickly and effectively by the snapshots from 2000.) Sometimes, top-level archived pages will display properly, but lower-level pages will result in an error message. In addition, content which was generated dynamically (e.g., from a built-in site search) on a site, and downloads such as PDF files, may still be inaccessible via the Wayback Machine. However, the site remains a great option for accessing older versions of known URLs.

For help with tracking down a broken link, be sure to Ask a Librarian.

No comments: