“We are trying to capture content that is in danger of disappearing,” says Larissa Ringham, Digital Project Librarian at UBC Library’s Digital Initiatives. Over the past year, a team at UBC Library has been creating web archives to preserve materials relating to the COVID-19 pandemic response in British Columbia for future research.
UBC Library began web archiving publicly available content in 2013, when the Canadian federal government set out to streamline and simplify access to information on government websites. One technique used to effect this change was to reduce redundant, outdated and trivial (ROT) content.
“Unfortunately there was a lot of important content on these websites that wasn’t transferred,” says Susan Paterson, Government Publications Librarian at Koerner Library, who spearheaded the project. “Because so much of government information is produced digitally, or is born digital, web archiving is the prime way of doing any kind of digital collection development and preservation.”
Working in partnership with several other universities across Canada, Ringham and Paterson set up the Federal Government Websites collection to preserve some of that information before it was lost, with “the theory that lots of copies keep [information] safe. We didn’t want all these websites preserved on a single government server as this can impact accessibility,” says Paterson. Since that initial project, they have launched 15 other projects with more than 3 terabytes of captured web data to date.
When the COVID-19 outbreak began in March 2020, Ringham and Paterson were able to move quickly again to start capturing materials directly related to COVID-19. That first COVID-19 collection became a general information archive that includes updates from Provincial Health Officer Dr. Bonnie Henry and Health Minister Adrian Dix, the BC provincial government and City of Vancouver websites, and media releases, articles and other publications from news media, health authorities and professional organizations.
They have since expanded their efforts to create several web archives documenting different facets of the COVID-19 pandemic, including UBC’s response, K-12 education in British Columbia, its impacts on the Downtown Eastside, and incidents of racism against the Asian communities in Canada. All together, these collections help preserve, with a focused local lens, the digital footprints of what will be a much-studied period in our history.
UBC Library captures content via Archive-IT, a web archiving service that stores the collected data in data centers that are independently owned and operated by the non-profit digital library Internet Archive. Quality assurance takes time, says Paterson, it’s not just about running crawlers, the automated software used to search, index and capture live web content.
“Web archiving requires a great deal of work to ensure that we capture the sites accurately, and we try to work with UBC iSchool students whenever possible,” says Ringham, who noted that for the COVID-19 collections, they worked with students from UBC’s iSchool as well as one graduate student from the University of Toronto. Digital Initiatives also regularly partners with other academic institutions, community organizations, and UBC faculty to build out new collections.
“A lot of it is right now for posterity,” says Ringham. “But with the assumption that researchers will find it useful in the future.” With the recent surge of interest in digital scholarship and tools that support text and data mining becoming more mainstream, web archives are a valuable primary source for researchers across disciplines.
Learn more about web archiving at Digital Initiatives and explore all UBC’s web archive collections.
This project is part of UBC Library’s strategic direction to advance research, learning and scholarship.
Learn more about our Strategic Framework.