1 of 34

Hyrax for RDR

Moira Downey

Will Sexton

Duke University Libraries

Fritz Freiheit�Nabeela Jaffer

University of Michigan

Jon Dunn

Indiana University Libraries Bloomington

Samvera Connect

October 11, 2018

2 of 34

What We’ll Cover Today

Introductions and Overview

Curation and Workflow

Branding and relationship to other platforms

Customizations

Data management and access

Questions

3 of 34

Introductions and Overview - Michigan

https://deepblue.lib.umich.edu/

4 of 34

5 of 34

Deep Blue Data - Sufia To Hyrax

Sufia beta

Sufia 7.0

Hyrax 1.0

Hyrax 2.0

Feb 2016

Sept 2016

Aug 2017

Nov 2018

6 of 34

Data Stats 2016-2019

7 of 34

Deep Blue Data Stats

As of Oct. 2nd, 2018

Public Totals

Collections: 7

Works: 132

File sets: 3700

Collections size: 211 GB

Works size: 8.15 TB

Totals

Collections: 36

Works: 278

File sets: 5870

Collections size: 229 GB

Works size: 11.1 TB

8 of 34

Big button to initiate the submission process.

Much of our work has been around mediated deposit - all submissions are handled by curation team.

9 of 34

Introductions and Overview - Indiana University

10 of 34

Introductions and Overview - Indiana University

11 of 34

Introductions and Overview - Indiana University

  • Enterprise Scholarly Systems
  • Collaboration between:
    • IU Bloomington Libraries
    • IUPUI University Library
    • University Information Technology Services - Enterprise Systems
  • Goal: Common enterprise-level repository infrastructure for IU (7 campuses)
    • Samvera/Fedora technology base�
  • Hyrax data repository work initial pilot for Bloomington
  • Collaboration with University of Michigan

12 of 34

Curation and submission workflow

Michigan:

  • Only University of Michigan users can deposit
  • Started with open deposit system. Led to orphan deposits
  • Now one-step mediation. No pre-submit mediation unless the researcher contacts RDS before submission
  • Reducing user confusion and frustration with one step mediation
  • Research Data Service group reviews the dataset and publish it
  • Users can Mint DOI

13 of 34

Curation and submission workflow

Duke:

  • Pre-publication workflow that allows staff to evaluate datasets before ingest
  • No direct uploading of materials by depositors
  • Submission web-form directly coded into the application
    • captures metadata in csv for batch ingest
    • integration with Box API allows depositors to share files
  • Curation happens on local machines—visual inspection of files for PII, PHI, check for adequate documentation, preservation-friendly file formats, etc.
  • Data ingest and publication, DOI minting carried out by repository staff

14 of 34

15 of 34

16 of 34

Curation and submission workflow

Indiana:

  • Currently (DSpace) have highly mediated submission
  • Much manual intervention/file movement due to lack of integration with storage services
  • Manual DOI minting via Datacite
  • Desire one-step mediation / unmediated; automatic DOI assignment, PII checking

17 of 34

Branding and relationship to other platforms

Michigan:

  • One platform: Deep Blue
    • Deep Blue Data
    • Deep Blue Docs (currently on DSpace)
  • Working on migrating Deep Blue from DSpace to Hyrax
  • The researchers can add a link to the IR. Talking to ICPSR to improve discoverability
  • https://deepblue.lib.umich.edu/

18 of 34

19 of 34

Branding and relationship to other platforms

Duke:

  • DDR-RD is one part of overall support and outreach for RDM
  • Attempt to do catchy branding did not result in success
  • However, considering work to unify with DSpace IR
  • No one feels sentimental about “DukeSpace” (IR) brand
  • Unified research portal would implement consistent branding across Hyrax & DSpace

20 of 34

Data repository support is one part of RDM program at DUL

21 of 34

DDR-RD is in turn one part of data repository support within RDM

22 of 34

RDM is also one part of the Libraries’ overall support for open research

23 of 34

Can we share branding, naming, and portal development across these platforms to present a unified approach to support for open research?

24 of 34

Branding and relationship to other platforms

Indiana:

  • Open questions about branding
  • Relationship to IUScholarWorks
  • Relationship to campus-specific branding
  • Interactions w/ IR, digital collections repo, streaming service (Avalon)

25 of 34

Customizations

Michigan:

  • Custom metadata
  • DOI
  • Tombstone
  • Globus/Download All
  • Custom UI Ingest process
  • Provenance log
    • CRUD: who, what, when
  • Cosign
  • Virus Check: ClamAV

26 of 34

Customizations

  • Stripped out self-deposit/curation functionality
  • Versioning of works (datasets)
  • Vertical breadcrumb trail to show work hierarchy and context
  • Bulk import and export, including nested works - adapted from Hyku

27 of 34

Version widget includes note for versioned item. Versioning uses Dublin Core replaces - isReplacedBy with DOI.

Warning banner displayed for replaced version of a work.

28 of 34

Vertical breadcrumb trail in right-hand sidebar shows nesting and hierarchy for works and their children.

Suppressed the OOTB horizontal breadcrumb trail

29 of 34

Customizations

Indiana: (future)

  • Custom metadata
  • DOI minting
  • Data citations
  • Directory structure / zip file browsing and preview

30 of 34

Possible Collaboration on features:

  • Developing a means to provide a suggested citation to the user as field within the metadata record.
  • API support
  • Improving search and discovery in our repositories.
  • Ability for admin to generate reports on admin and user activities (depositor as well as consumer) with the repository. May or may not include Google analytics integration.
  • Creating means for repository to recognize which files are serving as documentation for the data set and to highlight these files as documentation to the user.
  • FIles and folders structure

31 of 34

Data management and access

Michigan:

  • Data migration from Sufia 7 to Hyrax 1
    • In-place migration
  • Data migration from Hyrax 1 to 2
    • Export/Import, keeps hyrax id
    • Checksum validation on import
    • Import process has the characterization and derivatives same as the self-deposit process
    • along with the metadata and files
  • Data access
    • Zipped download
    • Globus download

32 of 34

Data management and access

Duke:

  • Migration from custom Fedora 3/Samvera stack to Hyrax
    • scripted metadata and file export from the old stack
    • metadata munged and crosswalked to "new" metadata profile
    • ingested work on behalf of old depositors (proxy deposit)
    • final step: re-routed persistent identifiers (ARKs, DOIs) with EZID and Crossref clients
  • Using Box for file transfer now, but serious limitations to Box
    • max file size is 15GB
    • large deposits will require alternate workflow
    • interest in leveraging Globus
  • Full dataset available through zipped download
    • still limited to authenticated users
    • no real experience with large data

33 of 34

Data management and access

Indiana:

  • Scholarly Data Archive (SDA) HSM system for large files
    • Centrally-supported university resource with multi-petabyte capacity
    • Tape-based, using IBM HPSS management software
    • Mirroring between Bloomington and Indianapolis data centers
    • Low/no cost to Libraries
    • Currently used for files >150MB, may increase threshold
  • Need for multiple upload/ingest methods
    • Currently have normal DSpace HTTP upload and Box upload widget
    • Need to support direct ingest from SDA, other research storage systems, Globus, etc.
  • Multiple download methods
  • Export of preservation packages to SDA/DPN/etc.

34 of 34

Thank you!

Questions?