1 of 36

Systems Librarianship 101: �Moving Library Metadata

Blake Galbreath

Asst Head of Systems

Washington State University

2 of 36

Agenda

  • Common record types (MARC, Dublin Core, BIBFRAME)
  • Getting data using existing tools
  • Vs. Creating your own tools
  • Transform/normalize data
  • Tips from the audience

3 of 36

Common Record Types

MARC

XML

    • MarcXML
    • Dublin Core
    • BIBFRAME

4 of 36

MARC21

    • MAchine-Readable Cataloging
    • Format is somewhat odd
    • Saves space (which used to matter more)
    • Fields are marked by tags > indicators > subfields
    • Some fields are repeatable, others are not
    • https://www.loc.gov/marc/bibliographic/

5 of 36

MARC21 Record

=LDR  08446cam a2200733 i 4500

=001  99900464501501842

=008  181120t20192019nmuab\\\\b\\\\001\0\eng\\

=010  \\$a  2018055939

=035  \\$a(OCoLC)on1076374968

=245  00$aCeramics of the indigenous cultures of South America :$bstudies of production and exchange through compositional analysis /$cedited by Michael D. Glascock, Hector Neff, and Kevin J. Vaughn.

6 of 36

XML

    • Format is common and somewhat easy to use
    • Uses parent:child node format
    • Any text editor can handle, but one that uses syntax highlighting (e.g., Notepad++) is very useful

7 of 36

XML: Dublin Core

    • A set of 15+ terms to describe resources
    • Each element is optional and may be repeated

8 of 36

Dublin Core Record

9 of 36

XML: MARCXML

    • MARC21 records expressed in XML format

10 of 36

MARCXML Record

<?xml version="1.0" encoding="UTF-8"?>

<collection>

    <record>

        <leader>08265cam a2200685 i 4500</leader>

        <controlfield tag="001">99900464501501842</controlfield>

        <controlfield tag="008">181120t20192019nmuab    b    001 0 eng    </controlfield>

        <datafield tag="010" ind1=" " ind2=" ">

            <subfield code="a">  2018055939</subfield>

        </datafield>

        ...

11 of 36

XML: BIBFRAME

    • Replacement framework for MARC, but not widely implemented yet
    • Initiative to evolve bibliographic description standards to a linked data model
    • Organizes information into Work, Instance, and Item
    • To see what conversions between MARC and BIBFRAME look like, see LoC > BIBFRAME Tools: https://id.loc.gov/tools/bibframe/

12 of 36

Migrating Records: Overview

  • Export data from old system or grab from some host
  • Transform to conform to new system
  • Ingest within new system

13 of 36

Alma/Primo: OAI-PMH

  • Ready-made UI
  • Doesn't require much more than the OAI Base URL and Set Spec/Name

14 of 36

Basics of OAI-PMH Syntax

15 of 36

Double-check Data

16 of 36

Tools to get data

OpenRefine

MarcEdit

Scripting

17 of 36

Open Refine

  • Add Column by Fetching URLs

18 of 36

MarcEdit

MarcEdit: Your complete free MARC editing utility

    • https://marcedit.reeset.net/downloads
    • MarcBreaker/Maker
    • Convert between many data formats
    • Harvest OAI Records

19 of 36

MarcEdit: Harvest OAI Data

  • Very simple approach
  • Breaks your data into files per resumption token

20 of 36

Script It

  • Set up script
  • Grab data
  • Save to file

21 of 36

Why script when there are other tools?

Tools don't always fully address the project

Interface is broken or the exported data is insufficient

Someone went on vacation and forgot to leave the keys

You would like to set up an automated data retrieval system

22 of 36

Example: DSpace to Esploro

  • Esploro supported loading assets in OAI, but not in XOAI
  • The XOAI contained more information
    • Link to the digital file
    • Asset organizational unit affiliation 
  • Also allowed us to run the two systems in parallel for ~6 months without double entry

23 of 36

Workflow for harvesting data

  • Same concept as CONTENTdm example, but use Record IDs:
    • Create text file of Record IDs
    • Loop over Record IDs and append to base URL
    • Download XOAI records
    • Normalize data
    • Save data to file

24 of 36

Example: EAD Data to PrimoVE

25 of 36

Transform data

  • Find and replace
    • Regular expressions
    • Normalization rules
    • Scripting

26 of 36

MarcEdit: Find and Replace

27 of 36

Discovery Import Profiles: Links

  • Normalize data during OAI-PMH import
    • $$LinkingParameter2 = collection\/(.*)\/id from dc:identifier
    • $$LinkingParameter3 = id\/([0-9]+) from dc:identifier
    • Combine to create Thumbnail Link: https://content.libraries.wsu.edu/digital/api/singleitem/collection/$$LinkingParameter2/id/$$LinkingParameter3/thumbnail

28 of 36

Thumbnail

29 of 36

Primo VE Normalization Rules

  • Example: OCoLC#
    • rule "Primo VE - Lds100"
    •     when
    •      MARC is "035"."a" AND MARC."035"."a" match ".*OCoLC.*"
    •     then
    •         set TEMP"1" to MARC."035"."a"
    •         remove string (TEMP"1","\\(OCoLC\\)")
    •         remove string (TEMP"1","on")
    •         remove leading and trailing spaces (TEMP"1")
    •         create pnx."display"."lds100" with TEMP"1"
    • end

30 of 36

Primo VE Normalization Rule: Before and After

31 of 36

Script: Mapping Old Data to New Structure

32 of 36

XSLT

  • eXtensible Stylesheet Language Transformations
  • XML language using XPATH to rearrange elements of an XML document
  • Original XML + XSLT ==> XSLT Processor ==> New Document

33 of 36

From Original XML to New Document

<notification_data>

<items>

<physical_item_display_for_printing>

<available_items>

<available_item>

<call_number>E99.P9 W46 1999</call_number>

34 of 36

Closing Thoughts

  • Use same methods to back up your migrated data
  • Trim is a very helpful function!
  • Use CSV
  • Document/comment everything

35 of 36

Questions�Comments�Feedback

What else can audience recommend for migrating, using, and storing library data?

36 of 36

Works Consulted