1 of 27

Hydra in a Box:

Building A Next-Generation Platform for Digital Collections

Hannah Frost, Stanford University

Gretchen Gueguen, DPLA

Mark A. Matienzo, DPLA

DPLAFest 2016 — April 14, 2016

2 of 27

Project Overview

  • A Time for Change
  • The Vision
  • Project Partners
  • Project Goals
  • Timeline

3 of 27

A Time for Change

  • Conversations between Stanford University, DPLA, and DuraSpace informed project design
  • Current digital collections platforms originate in an earlier phase of the web, which explain current limitations
  • Infrastructure needs in the DPLA Hub network
    • Legacy systems unable to leverage modern affordances of the web
    • Lack of scalable and sustainable aggregation workflows
    • Lack of support for linked data and metadata enrichment
    • Perceived lack of “obvious choices” for replacement systems for digital collections

4 of 27

The Vision

  • A product and service that is easy to use, easy to integrate, and that
  • Reduce barriers (including cost) to DPLA contribution
  • Allow digital collections to be not just on the web, but of the web
  • Expand and diversify both the DPLA and Hydra communities

5 of 27

Project Partners

6 of 27

Project Goals

  • Development of turnkey, Hydra-based application that leverages and improves on core components
  • Development/integration of metadata aggregation & enrichment tools
  • Connect components with DPLA hubs, current Hydra partners, and prospective Hydra adopters
  • Work toward a hosted service

7 of 27

Timeline

  • May 2015-November 2017 (30 months)
  • Design process: May 2015-March 2016
  • Development: March 2016-November 2017
  • Service development and community engagement: throughout project

8 of 27

Design Phase

Discovery Phase (Fall 2015)

  • Literature review and product/service analysis
  • Surveys, interviews, and focus groups
  • Community outreach

Information Architecture (Winter 2016)

  • User requirements and personas
  • Requirements - functional & technical
  • Models and wireframes

Visual Design (Spring 2016)

9 of 27

Design Phase

10 of 27

Key Areas of Progress

Design, Requirements and Specifications team:

  • Community survey insights�
  • Analysis of user interviews, focus groups�
  • Content types requirements for data modeling���

11 of 27

Community Survey

256 complete responses��311 repositories��Mostly small, US academic�libraries��

12 of 27

Survey Insights

  • Expectations of our project�
  • Satisfaction levels
    • Users of hosted services tend to be more satisfied than users of local deployments�
  • Strengths and weaknesses of existing repository options�
  • 53% plan to migrate to another system
    • Most to a Fedora-based solution
    • Rest are “not sure” what’s next�

13 of 27

User Interviews

  • Completed 21 individual or small-group interviews and 4 focus groups
    • 55 individuals in total
    • 46 institutions in the US and Canada
    • 29 hours of recorded content
  • Interviews held either in-person or through videoconference; focus groups held in-person�
  • Coded and analyzed process to further identify potential requirements���

14 of 27

Content Analysis Visualizations

15 of 27

Interviewee’s Notable Quote

“... How many of these different systems do you need? You can have your digital collections with images and documents, you can have your IR, you can have your digital preservation system, and you can add Omeka on top of that to do exhibits. It's just too much to have four or five different systems.”��

16 of 27

Content Type Analysis

���

17 of 27

Early Technical Exploration

  • Deploying to the Cloud
    • Leverage services for institutions without local infrastructure�
  • Simplifying installation and configuration
    • Users should not need to be technical to set up and maintain an instance�
  • Determining a starting point for application development
    • Build on existing community-based work if possible
    • Sufia 7.0 - actively under development

18 of 27

Repository Development

  • Assembled an all-star technical team
    • 10 Engineers: software development, data modeling, development operations
    • Contributions from other institutions (Penn State, maybe others)
    • Led by Michael Giarlo�
  • First work cycle: March - June 2016
    • Series of one-week sprints
    • Recorded demos of iterative progress, available to the public�
  • First milestone: Deploy our application based on Sufia 7 to the cloud
    • Priority content types
    • Configuration UI
    • Administrative dashboard
    • Batch import

19 of 27

Follow our progress

20 of 27

Aggregator Needs

  • More flexible mapping standard than XSLT
  • Ability to harvest from multiple sources
  • Reconciliation services that utilize linked data
  • Enhanced quality control tools
  • Ability to normalize and create consistencies in data values
  • Easily get data in and out
  • Robust enough to handle multiple feeds and multiple sources
  • Processes to move data from one repository to another resemble �aggregation workflows

21 of 27

DPLA’s Aggregation System, Heiðrún

Three Main Functions

  • Harvest
    • Source agnostic
  • Map
    • Mapping DSL expressed in Ruby
    • Maps to RDF triples
  • Enrich
    • Modular enrichments written to normalize and enhance data

22 of 27

harvest

map

enrich

Marmotta

original record data store

Partner data store

Dashboard

QA

staging

production

mapping

Enrichment profile

Institution profile

User Interface

Aggregator

23 of 27

harvest

map

enrich

Marmotta

original record data store

Original data store

Dashboard

QA

mapping

Enrichment profile

Institution profile

User Interface

Aggregator

Exported

data store

24 of 27

Roadmap

  • Completing requirements now
  • April - July
    • Design remaining infrastructure
    • Develop user interface requirements further
  • August - November
    • Develop dashboard tools
    • Analyze convergence points with repository
    • Plan for improvements to QA interface
    • Begin User Testing
  • November - March 2017
    • Develop QA improvements
    • Refine interfaces and infrastructure
    • Implement job scheduling

25 of 27

Developing a Hosted Service

  • Project partners collaborating to develop requirements for a cloud-hosted service based on the repository product under development
  • Market research underway, starting with analysis of information discovered during the design phase
  • Evaluating tiered service models depending on needs of potential adopters
  • Significant technical work to focus on develop a shared, maintainable, and scalable service

26 of 27

More Information

Visit our website and blog: http://hydrainabox.org/ ��Follow us on Twitter: @HydraInABox

Public information list

hybox-info@googlegroups.com

�Contact us

hybox-contact@googlegroups.com

27 of 27

Thank You!

Hannah Frost�hfrost@stanford.edu�@feefifofannah��Gretchen Gueguen�gretchen@dp.la�@G_AmSpinnrade��Mark A. Matienzo�mark@dp.la�@anarchivist��http://bit.ly/dplafest-hybox