1 of 38

Accessible Web Archives:

Rethinking and Designing Usable Infrastructure for Sustainable Research Platforms

Samantha Fritz, MLIS

Project Manager, Archives Unleashed

WAC, 15-16 June 2021

2 of 38

of logging onto the WWW

2

3 of 38

“Meet the demand for automated information-sharing between scientists in universities and institutes around the world”

4 of 38

fastest growing communications medium of all time

5 of 38

The web has shaped how we connect with one

another and interact with information.

5

6 of 38

We all have a relationship to data

  • Organize
  • Search
  • Provide access

6

7 of 38

We also use data to interpret and understand the world around us

7

8 of 38

impacts the way we produce, preserve and interact with information.

The web has provided a new context for research data

9 of 38

4.66 BILLION

internet users

95 Million

Photos & Videos

per day

306.4 Billion

Emails

per day

1.7MB

Of data

/sec/person

9

10 of 38

800

web pages

have been created since the start of this presentation

11 of 38

we risk losing potentially significant information

11

12 of 38

Development of Web Archiving

1996-2021

Increasing adoption of web archiving mandates among memory institutions around the world

1992

World wide web is launched

1996

Conscious effort to preserve born-digital content

First large-scale preservation projects

12

13 of 38

The web is a critical source for studying our digital cultural heritage

13

14 of 38

Opportunities

Expands scope to incorporate a wider and more diverse range of voices and perspectives

Shift in scale from resource scarcity to abundance

(Roy Rosenzweig)

14

15 of 38

Challenges are inevitable when dealing with data

Occur throughout web archiving lifecycle:

  • Selection
  • Collection
  • Organization & Storage
  • Description/Metadata
  • Access & Use

Challenges

15

16 of 38

web archives have largely remained inaccessible

Despite the volume of data captured

17 of 38

Barriers to Access

& Use

  • Required understanding of high-performance computing
  • Familiarity with command line
  • Lag in analytical tools
  • Limitations of time, resources, support

17

18 of 38

How can we lower barriers of access and use to web archives?

2017-2020

19 of 38

Archives Unleashed Project

2017 - 2020

19

Image: Archives Unleashed Project Timeline

20 of 38

Tools & Platforms

21 of 38

  • Open-source platform for analyzing web archives,

  • Built on Apache Spark to provide powerful tools for analytics and data processing

  • Applies modern big data analytics infrastructure

Archives Unleashed Toolkit

21

Image: Archives Unleashed Toolkit via Sparkshell

22 of 38

  • Documentation provides pre-built scripts

  • Analytic tasks:
    • Collection Analytics
    • Text Analysis
    • Network Analysis
    • Binary Extraction

Archives Unleashed Toolkit

22

Image: Archives Unleashed Toolkit Documentation

23 of 38

  • Uses Toolkit code base
  • One-stop, web-based portal
  • Scholars ingest their Archive-It collections and execute a number of analyses with the click of a mouse
  • Generate and explore derivatives and in-browser visualizations

Archives Unleashed Cloud

23

Image: Archives Unleashed Cloud

24 of 38

  • The Cloud will be sunsetting at the end of June 2021
  • Efforts continue in collaboration with Archive-It to integrate and implement a new Cloud interface

Archives Unleashed Cloud

24

Image: ARS-Cloud Prototype, Concept Design

25 of 38

Accessibility

&

Usability

26 of 38

Defining Access & Use

Access / Accessibility

the ability to make use of something, or capability of being reached, used, understood or appreciated

Usability

“The quality or state of being usable; ease of use”

26

We cannot talk about access without acknowledging the vital role usability plays

27 of 38

Applications of Access and Use Concepts

28 of 38

  • Transparent tool; there are no hidden processes as researcher see analysis requests and output results
  • Integrated feedback and addressed growing areas of interest
  • Publicly available, free, open-source
  • Robust and flexible in conducting large-scale web archive analysis
  • Integrated widely adopted and stable programming languages and best practices

Code Base

28

29 of 38

  • The Cloud provides a web-base front end to the Toolkit
  • Addresses hesitancy of using command-line
  • Interface that is intuitive and familiar
  • Individuals tend to be more comfortable with a click to results type task process
  • Broadens access to web archives using the WASAPI Data transfer API.
  • Individuals already set up with an Archive-It account are able to ingest and explore their WARC files
  • Examined experimental approaches to expand access, use and interoperability with other platforms

User Interface

29

30 of 38

Datasets

Collaboration to process web archive collections and make derivatives available for all to use and explore.

Great starting point for scholars who might not have access to a web archive collections

Learning

Resources

Learning guides provide instructions on how to use and explore Cloud derivatives with external tools like Gephi and AntConc.

Toolkit Documentation

Cookbook approach, with pre-built scripts that users can plug in to address common analytic tasks.

Addresses uncertainty of how to use Toolkit

Supporting Materials

30

31 of 38

General Takeaways

  • Web archives are critical source for studying topics post-1990 for many scholars
  • Despite the volume of data captured by institutions across the globe, web archives have largely remained inaccessible and difficult to use.
  • Archives Unleashed has focused on lowering barriers to access and use by contributing two substantial tool developments to the community.
  • Our team has thoughtfully integrated concepts of access and usability throughout project development cycles

31

32 of 38

General Takeaways

  • Archives Unleashed incorporates the spirit of access and usability by:
    • Providing access points for exploring web archives via development of the Toolkit and Cloud
    • Tools are created as user-friendly as possible, which are robust, transparent, flexible, and intuitive
    • Creating documentation and resources to support training and learning
  • Access and usability need to cooperate in partnership

32

33 of 38

If you build it, they will come

In designing infrastructure for sustainable research platforms, we need to thoughtfully apply concepts of access and usability

34 of 38

If you design it, will it be usable?

In designing infrastructure for sustainable research platforms, we need to thoughtfully apply concepts of access and usability

35 of 38

CREDITS

35

36 of 38

References

36

37 of 38

Images Used

In order of appearance; image title provided where possible.

37

38 of 38

https://archivesunleashed.org