1 of 24

Who are you calling generic?: �Using Generic XML normalization rules for ETDs in Primo VE

Elliot Williams, Metadata Strategist

University of Texas at San Antonio

Ex Libris Southcentral Users Group (ELSUG), Dec 6, 2024

2 of 24

Background

  • UTSA has been using Alma and Primo VE since 2018
  • We use DSpace as our institutional repository, hosted by Texas Digital Library
  • Started adding electronic theses and dissertations (ETDs) to our IR in 2024
  • DSpace started being added to Primo VE recently, in 2023, so we were starting with a relatively clean slate of how to manage the process

3 of 24

Quick Overview of Harvesting External Records in Primo VE

  • Typically used for harvesting metadata from an external system into Primo.
  • The external system provides XML metadata via OAI-PMH, then Alma imports that metadata and transforms it for display & searching in the Primo environment.

External System

Discovery Import Profile harvests records

Normalization Rules transform records to Primo structure

Records are searchable in Primo

4 of 24

Quick Overview of Harvesting External Records in Primo VE

  • Discovery Import profile – Defines the actual import of records (OAI-PMH endpoint, metadata format to use, etc.)
    • Metadata source format can be Dublin Core, Generic XML, or MARC
  • Normalization rules (Discovery) – Defines how to map the metadata from the OAI-PMH feed to Primo’s metadata format
    • Depending on the source format, these may be simple or complex
    • Normalization rules have to be added to a normalization process, which is then applied to a discovery import profile.

External System

Discovery Import Profile harvests records

Normalization Rules transform records to Primo structure

Records are searchable in Primo

5 of 24

Discovery import profile

6 of 24

Dublin Core vs Generic XML Normalization

DUBLIN CORE

  • Mapping is pre-defined by Ex Libris
  • Can only be used with DC metadata
  • Limited ways you can manipulate data
  • Often, all you have to do is move data from one field to another and include resource type mappings

GENERIC XML

  • No default mapping – you have to specify everything
  • Can be used with any XML metadata (DC or not)
  • More ability to manipulate data using XPath functions
  • Can only map XML elements to Dublin Core fields or local Primo fields, not “standard” Primo fields

7 of 24

DSpace in Primo

  • When we started adding non-ETD metadata from DSpace, we used the standard Dublin Core normalization rules.
  • However, DSpace doesn’t include all of the thesis-specific metadata fields in its Dublin Core output. And for Dublin Core fields, the OAI-PMH metadata doesn’t include all of the qualifiers that DSpace uses for DC fields.
    • For example, dc.date.available and dc.date.issued both just become dc.date.
  • DSpace also provides metadata in XOAI format, which includes all elements, including qualifiers
  • Two advantages to XOAI and Generic XML:
    • More, and more granular, metadata from DSpace
    • Greater ability to manipulate the metadata in Primo

8 of 24

DSpace XOAI metadata

  • <element name="contributor">�    <element name="author">�         <element name="none">�               <field name="value">Bonner, Robert Lee</field>�               <field name="authority">56c48883-c40b-43c3-a50b-bf382f611437</field>�         </element>�    </element>�    <element name="committeeMember">�         <element name="none">�               <field name="value">Lengnick-Hall, Cynthia A.</field>�               <field name="value">Rudy, Bruce C.</field>�               <field name="value">Keating, Jerome P.</field>�         </element>�    </element>�</element>

9 of 24

Example: Thesis Advisors & Committee Members

  • In Dublin Core, ETD advisors and committee members are all put in <dc.contributor>. In XOAI, each role has a demarcated subelement.

<dc:contributor>Ahmed, Sara</dc:contributor>

<dc:contributor>Jin, Yufang</dc:contributor>

<dc:contributor>Cao, Yongcan</dc:contributor>

<dc:contributor>Alamaniotis, Miltos</dc:contributor>

<dc:contributor>Ramasubramanian, Deepak</dc:contributor>

<element name="contributor">� <element name="advisor">� <element name="none">� <field name="value">Ahmed, Sara</field>� </element>� </element>� <element name="committeeMember">

<element name="none">

<field name="value">Jin, Yufang</field>

<field name="value">Cao, Yongcan</field>

<field name="value">Alamaniotis, Miltos</field>

<field name="value">Ramasubramanian, Deepak</field> </element>�</element>

10 of 24

Example: Thesis Advisors & Committee Members

Generic XML normalization rules:

  1. Look for elements with nested <element name=“contributor”><element name=“advisor”>
  2. Extract the value of that element
  3. Concatenate “Advisor: ” to the beginning of that string
  4. Add the concatenated value to discovery.local3 field (a local display field that displays as “General Note”)

Then do the same thing for Committee Members.

(The normalization rules couldn’t loop through all of the advisor and committee member elements, so I had to add 8 rules for committee member and 2 for advisor.)

11 of 24

Example: Thesis Advisors & Committee Members

// ADVISORS & COMMITTEE MEMBERS

// Concatenate with display label, and add to local display field 3 (General Note).

// Need to have one rule for each possible instance of each field (up to 8 cmte members and 2 advisors). Rules are in inverse order here, so they will display in same order as DSpace.

rule "advisor note 2"

when

exist "//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='contributor']/*[local-name()='element'][@name='advisor']/*[local-name()='element']/*[local-name()='field'][@name='value'][2]"

then

copy "concat('Advisor: ',//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='contributor']/*[local-name()='element'][@name='advisor']/*[local-name()='element']/*[local-name()='field'][@name='value'][2])" to "discovery"."local3"

end

Looks for “advisor” element that is under “contributor” element

This rule is for the 2nd instance of this element – there are similar rules for the first instance.

Uses “concat” to add a string to the value of advsitor, and adds that to local field 3

12 of 24

Example: Thesis Advisors & Committee Members

Using Dublin Core:

Using Generic XML:

13 of 24

Example: Degree information

  • ETD-specific fields (e.g. degree type, department) are not included in the OAI-DC feed. Using the XOAI metadata meant we could include those fields, and normalization rules could be used to combine them into a single field for display.

<element name="thesis">� <element name="degree">� <element name="department">� <element name="none">� <field name="value">Electrical and Computer Engineering</field>� </element>� </element>� </element>� <element name="name">� <element name="none">� <field name="value">Doctor of Philosophy</field>� </element>� </element>

</element>

Dissertation (Doctor of Philosophy) – University of Texas at San Antonio, Electrical and Computer Engineering, 2023

14 of 24

Example: Degree Information

  • // DISSERTATION NOTE
  • // Concatenate from thesis.degree fields, and add to local display field 3 (General Note)
  • // Format should be: Dissertation/Thesis ({degree.level}) -- {degree.grantor}, Department of {degree.department}, {date.issued}

  • rule "Dissertation note - Doctoral"
  • when
  • "//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='degree']/*[local-name()='element'][@name='level']/*[local-name()='element']/*[local-name()='field'][@name='value']" equals "Doctoral"
  • then
  • copy "concat('Dissertation (',//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='degree']/*[local-name()='element'][@name='name']/*[local-name()='element']/*[local-name()='field'][@name='value'],') -- ',//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='degree']/*[local-name()='element'][@name='grantor']/*[local-name()='element']/*[local-name()='field'][@name='value'],', Department of ',//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='degree']/*[local-name()='element'][@name='department']/*[local-name()='element']/*[local-name()='field'][@name='value'],', ',//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='date']/*[local-name()='element'][@name='issued']/*[local-name()='element']/*[local-name()='field'][@name='value'])" to "discovery"."local3"
  • end

Looks for degree.level element with a value “Doctoral”

Combines degree.name, degree.grantor, degree.department, and date.issued elements into one string

15 of 24

Challenges and Trade-offs

  • Norm rules are very long, because the XOAI metadata structure is very verbose.
    • I tried to compensate for this by adding comments to each rule in the normalization rules.
    • Additional limitation: comments cannot be the first line in a normalization rule set.
  • You can’t use XML normalization rules to map to specific Primo display fields, only to Dublin Core fields or local display fields.
  • Adding complexity means adding risk, and likely adding more work down the road.

16 of 24

Epilogue: Where we are now

  • Ultimately, a lot of this work ended up not resulting in records that display to users.
  • We also have MARC records for most of our ETDs that are in ProQuest. We figured out how to get those records to dedup with the records being imported from DSpace (hooray!), but the MARC record tends to get precedence, so that is what displays to users in Primo.
  • However, the lessons learned from writing these normalization rules for ETDs led me to also re-write our normalization rules for other DSpace collections. So all of our DSpace records that are imported to Primo now use generic XML normalization rules, and have richer metadata in Primo because of it.

17 of 24

Useful documentation & examples

18 of 24

Thank you!�elliot.williams@utsa.edu

19 of 24

More Examples of Normalization Rules

20 of 24

Example: Open Access indicator

  • We wanted to include Primo’s OA indicator, but only on ETDs that were truly open – not those that were under embargo or restricted to UTSA users.

// OPEN ACCESS INDICATOR

// Only apply when access-status="open.access" (this excludes restricted and embargoed items)

rule "add OA indication"

when

"//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element'][@name='others']/*[local-name()='element'][@name='access-status']/*[local-name()='field'][@name='value']" equals "open.access"

then

set "Unrestricted online access" in "dcterms"."accessRights"

end

Looks for “access.status” element with value “open.access”

Writes a pre-defined value into the dcterms.accessRights field

21 of 24

Example: ETD Publication Date

  • In Dublin Core metadata, multiple date fields are included in <dc:date>, including the date the item was added to DSpace and the date the embargo will expire. In XOAI metadata, those dates have their qualifiers included.
  • For Primo, we only wanted to include the date the ETD was published.

<element name="date">�    <element name=“available">�         <element name="none">�              <field name="value">2024-01-25T19:32:03Z</field>

              <field name="value">2025-05-24</field>�     </element>�    </element>�    <element name="issued">�         <element name="none">�               <field name="value">2018</field>�         </element>�    </element></element>

<dc:date> 2024-01-25T19:32:03Z </dc:date>

<dc:date> 2025-05-24 </dc:date>�<dc:date>2018</dc:date>

22 of 24

Example: ETD Publication Date

  • // DATE
  • // Only using dc.date.issued for dc.date
  • rule "dc.date"
  • when
  • exist "//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='date']/*[local-name()='element'][@name='issued']"
  • then
  • copy "//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='date']/*[local-name()='element'][@name='issued']/*[local-name()='element']/*[local-name()='field'][@name='value']" to "dc"."date"
  • end

23 of 24

Example: Abstract vs Description

  • Using XOAI and XML normalization rules, we could put the abstract in one field, and other descriptive notes in a local “General Note” field. Because of the structure of the XML, we can tell if the dc.description element has a qualifier based on the number of sub-elements.

<element name="description">� <element name="none">

<field name="value"> This item is available only to currently enrolled UTSA students, faculty or staff. </field>

</element>� <element name="abstract">� <element name="none">

<field name="value">Abstract goes here. <field>

</element>� </element>

</element>

Non-abstract description fields are only one level below the main <element> tag.

Abstract fields are two levels below the main <element> tag.

24 of 24

Example: Abstract vs Description

// DESCRIPTION

// Add any dc.description fields that have a qualifier to dc.description. Currently, this is dc.description.abstract and dc.description.department

rule "dc.description - qualifier"

when

exist "//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='description']/*[local-name()='element']/*[local-name()='element']"

then

copy "//*[local-name()='record']/*[local-name()='metadata']/*[local-name()='metadata']/*[local-name()='element']/*[local-name()='element'][@name='description']/*[local-name()='element']/*[local-name()='element']/*[local-name()='field'][@name='value']" to "dc"."description"

end

Norm rule looks for two elements below the “description” tag, which is how we know that element has a qualifier (like “abstract”).