Skip to Main Content

Web Archiving at the University of Kentucky Libraries

This LibGuide provides an overview of the University of Kentucky Libraries Web Archiving Program for archival and library colleagues, students, and researchers looking for more information how and why web content is archived.

Why archive the Web?

Currently, the University of Kentucky Libraries are mandated to archive all University of Kentucky permanent records as well as collect, preserve, and provide access to the social, cultural, and political history of the University and of the Commonwealth of Kentucky broadly.  The responsibility of collecting these materials belongs to the Special Collections Research Center (SCRC).  The introduction of online meeting minutes of decision-making bodies; magazines, newspapers, and newsletters; course catalogs; press releases; photographs, video, and podcasts has resulted in an increasing amount of new material that is only accessible through the World Wide Web.  Because the Web is dynamic and ever-changing, these online documents are at constant risk of being altered, restructured, deleted, or taken down. 

The University of Kentucky Libraries Special Collections Research Center has been actively preserving this web-based content since June 2018 using the web archiving service Archive-It, launched by the Internet Archive in 2006 and powered by the Internet Archive's Wayback Machine.  Although the Internet Archive had already captured some of the UK web domain (www.uky.edu) starting in 1997, these captures were inconsistent and incomplete.  Archive-It allows the SCRC to pro-actively select and "crawl" websites and capture their content while also keeping the structural arrangements of this content intact so that the entire website can be "replayed" and viewed the way it looked and operated at the time of the crawl and capture.  This allows the SCRC to preserve UK and other Kentucky-related websites as accurately and as completely as possible as permanently as possible.  The websites are also captured at regular intervals in order to preserve the various versions of each site and page as it changes over time.

Archive-It is dual-sided:  it has an administrative back-end platform for harvesting websites and building collections, and a public-side website allowing for open access to published content.  The Wayback Machine (named for the time-traveling WABAC machine from the Rocky and Bullwinkle cartoon episode "Peabody's Improbable History") is designed to be able to "visit" past websites.  It is both a digital archive and an application that can read WARC (Web ARChive) format files.  WARC files are generated during a web crawl.  These files are read and then "replayed" with a new, permanent Wayback URL, but they look and function as they did on the day they were captured.

Basics of Web Archiving Presentations

Because the University of Kentucky Libraries and the Special Collections Research Center were starting from zero with web archiving, intern Emily Collier (now Assistant University Archivist) created seven presentations as part of her initial research and development for the University of Kentucky Web Archiving program in 2019.  These presentations provide a background and context for web archiving.  However, note that because the web, social media, and web archiving changes rapidly, some of this information may now be obsolete.  The presentations are copyright University of Kentucky and Emily Collier and are available for use under the Creative Commons license CC BY-NC.