1 of 16

Collections as Data @ Pitt

From Collection Records to Data Layers: A Critical Experiment in Collaborative Practice

Collections as Data Summative Forum

January 17, 2020

Tyrica Terry Kapral, Aaron Brenner, Matt Lavin, Gesina Phillips

University of Pittsburgh

GET THE SLIDES: bit.ly/36ZuHN4

2 of 16

Introductions: The Team

Tyrica Terry Kapral (Project Lead): Humanities Data Librarian, Digital Scholarship Services, University Library System

Aaron Brenner (Administrative Librarian): Associate University Librarian for Digital Scholarship and Creation in the University Library System

Matt Lavin (Disciplinary Scholar): Clinical Assistant Professor of English and Director of the Digital Media Lab

Gesina A. Phillips (Teaching and Learning Coordinator): Digital Scholarship Librarian, Digital Scholarship Services, University Library System

2

GET THE SLIDES: bit.ly/36ZuHN4

3 of 16

Introductions: Key Partners

Mike Bolam: Head of Metadata and Discovery, University Library System

Kristin Britanik: Digital Collections Coordinator, Archives and Special Collections, University Library System

Willow Gillingham: System Developer, Information Technology (IT), University Library System

Amy Murray Twyning: Director of Undergraduate Studies - Literature Program, Lecturer II, English Department

Dan Libertz: Teaching Fellow, English Department

3

GET THE SLIDES: bit.ly/36ZuHN4

4 of 16

About Our Project: Where to Explore

4

GET THE SLIDES: bit.ly/36ZuHN4

5 of 16

About Our Project: Goals & Strategies

Our project aims to develop effective strategies for accessing and enriching library collections data through research-driven and critically interpretive layers of additional data that are conducive to computational use.

  • Make library collections data accessible to scholars
    • source data: snapshots of library collections data (EAD, MODS, and MARC)
    • base layers: curated datasets of library collections data
  • Enable scholars to extend library collections data
    • extension layers: additional datasets that enrich/augment library collections data

5

GET THE SLIDES: bit.ly/36ZuHN4

6 of 16

About Our Project: Goals & Strategies (cont’d)

  • Collaborate with scholars teaching computationally minded data stewardship practices
    • Course integrations + Datathon
  • Increase visibility of library collections, especially those reflecting the perspectives of marginalized/underrepresented groups. Our target collections feature...
    • the voices of African Americans, American Labor Unionists, American left-wing organizations, the LGBTQ community, and feminists.
    • a diverse array of serials (e.g., journals, magazines, newspapers, newsletters) and ephemera (e.g., broadsides, flyers, cartoons).

6

GET THE SLIDES: bit.ly/36ZuHN4

7 of 16

Challenges & Successes: Data

  • Sheer number and variety of library metadata formats
    • EAD, MODS, MARC, DC, RELS-EXT, RELS-INT, TECHMD, etc.
  • Data was worse than we thought
    • Dirty (ex. often converted from other data formats)
    • Inconsistent due to changing data entry practices over the years
    • Sparse, a far cry from our idealized “base layer” → had to cut back
  • Lack of documentation here and there
  • The Metadata and Discovery team worked hard to help us, we got something to work with

7

GET THE SLIDES: bit.ly/36ZuHN4

8 of 16

Deliverables: Data

Data Repository

  • 3 folders
  • Each subfolder represents a Pitt ULS collection
  • 2 CSVs in base layer
    • collections-level
    • Item-level

8

GET THE SLIDES: bit.ly/36ZuHN4

9 of 16

Challenges & Successes: Code

  • Hack time
  • XML
  • Working code vs. sustainable code

9

GET THE SLIDES: bit.ly/36ZuHN4

10 of 16

Deliverables: Code (Python Script)

  • Hierarchical vs. flat
  • Field selection
  • Reads multiple schema
  • Collection- and Item-level data
  • Automation
  • Command line flags

10

GET THE SLIDES: bit.ly/36ZuHN4

11 of 16

Challenges & Successes: Instructional Modules

  • Cancellation of Fall 2019 course
  • Relied on volunteers to let us run modules as guest sessions
  • No opportunity for continuous engagement, feedback
  • Students seemed engaged, had thoughtful responses
  • Participants raised new ideas

11

GET THE SLIDES: bit.ly/36ZuHN4

12 of 16

Deliverables: Instructional Modules

Forthcoming:

  • Generalizing content
  • Modules need to be clear and easy to implement
  • Motivation and payoffs needs to be well articulated
  • Integrating critique into various modules

12

Modules to be released for reuse/modification:

Develop a Custom Collection

Design a Layer

Critique a Layer

Implement a Layer

Visualize a Layer

GET THE SLIDES: bit.ly/36ZuHN4

13 of 16

Deliverables: Instructional Modules

Complete

  • Pilot course integrations
    • Extension layers: genre terms (AAT), term frequencies

Upcoming

  • Datathon (Implement a Layer)
    • Extension layer: geographical data and descriptive tags
  • Second iteration of course integration
  • First Experiences in Research student work
    • TBD

13

GET THE SLIDES: bit.ly/36ZuHN4

14 of 16

Deliverables: Documentation

  • Principles and objectives of this work
    • why data layers?
  • Requirements / what you need
    • heavyweight and lightweight versions
  • How to use repository (e.g., navigating, contributing)
  • Data dictionary
  • Workflow models
    • Data extraction and transformation [Internal]
    • User data layer creation [External]

14

GET THE SLIDES: bit.ly/36ZuHN4

15 of 16

Post-grant Plans

  • Implementing workflows
    • Coordinating between Digital Scholarship Services, Metadata and Discovery, IT, Archives & Special Collections
    • Continued work with faculty and students (e.g., Archival Scholar Research Award students, courses)
  • Paper/Article
  • Future project ideas
    • Database platform for browsing, depositing, visualizing data(sets)
    • GUI for creating datasets
    • Integrating datasets with collections records

15

GET THE SLIDES: bit.ly/36ZuHN4

16 of 16

Questions?

Tyrica Terry Kapral

tyt3@pitt.edu

Aaron Brenner

abrenner@pitt.edu

Matt Lavin

lavin@pitt.edu

Gesina Phillips�gap64@pitt.edu

Tyrica Terry Kapral

tyt3@pitt.edu

Aaron Brenner

abrenner@pitt.edu

Matt Lavin

lavin@pitt.edu

Gesina Phillips�gap64@pitt.edu

16

GET THE SLIDES: bit.ly/36ZuHN4