ARKs in the Open: 3.2 Billion Persistent Identifiers
April 2020
Why care about ARK identifiers?
Because persistent, reliable web links are lacking.
Wanted: a flexible, low cost, vendor- and software-independent persistent identifier
ARK (Archival Resource Key)
University of California Berkeley�Smithsonian National Museum�National Library of France�University of Chicago�Musée du Louvre�Family Search�British Library�Google
Internet Archive�Bodleian Libraries�Berkeley Law Library�Bibliothèque Mazarine�New York Public Library�French National Archives�National Library of Austria�Library and Archives Canada
ARK anatomy
A labelled URL with a globally unique identity inside it
https://n2t.net/ark:/12345/fk1234
makes ARK actionable (the resolver)
core globally unique identity (independent of web and hostname)
What are ARKs used for?
Why ARKs and not DOIs
(or Handles or PURLs or URNs)?
ARKs and DOIs
DOIs (Digital Object Identifiers) – publishing industry solution
ARKs - cultural heritage solution
The Covenant of the ARK
The ARK scheme
will not charge fees to create or use ARKs
will not limit the number of ARKs you assign
will not limit the kind of content you identify
will not require metadata, nor even persistence
will not mandate use of any particular resolver
Getting involved in ARKs
Support open infrastructure
ARKs �at the
Smithsonian Institution
Bess Missell
Metadata Librarian
Smithsonian Libraries
missellb@si.edu
Overview
The Smithsonian Libraries & The Smithsonian Institution.
Collection metadata & multimedia objects.
Smithsonian Libraries is a network of 21 specialized research libraries, as well as central support services which include Smithsonian Research Online, a bibliography of Smithsonian publication citations and the Institution’s repository. library.si.edu
The Smithsonian Institution is the world’s largest museum, education, and research complex, with 19 museums and the National Zoo. www.si.edu
19 Museums + 1 Zoo
23.2M Visits by Public
155.5M Museum Objects & Specimens
2.2M Library Volumes
2,633 Scholarly Publications
154M Website Visitors
16.6M Social Media Followers
21 Libraries
2.2M Library Volumes
80K Smithsonian Research Online
772K Website visitors
239K Social Media Followers
The Smithsonian is Assigning ARKs to our Collection Systems
Examples include records and images for:
Scientific specimens from the National Museum of Natural History
http://n2t.net/ark:/65665/381440f27-3f74-4eb9-ac11-b4d633a7da3d
Cultural artifacts from the National Museum of American History http://n2t.net/ark:/65665/ng49ca746b2-42dc-704b-e053-15f76fa0b4fa
Sculpture from the Freer Gallery of Art & Arthur M. Sackler Gallery http://n2t.net/ark:/65665/ye3080ce305-a705-49cc-a70d-99aff8cb65da
Photographs from the National Museum of African American History and Culture
http://n2t.net/ark:/65665/fd5ad97cb86-caaf-4209-8fde-98d70f52f072
Paintings from the Smithsonian American Art Museum http://n2t.net/ark:/65665/vk7a466371d-0413-451f-bd76-ca0becc46f94
National Museum of Natural History
2015
Smithsonian Open Access Project
February 26th, 2020
Over 15 million ARKs and counting ...
ARKs were chosen because…
Courtesy of the Smithsonian Libraries
Alexandre, Arsène. Noé dans son arche. Combet et Cie, 1902, https://doi.org/10.5479/sil.720005.39088010288199
https://library.si.edu/digital-library/book/noeydanssonarch00alex
Smithsonian Libraries…
http://n2t.net/ is the resolver that takes the web call to the EZID service, who then uses the Name Assigning Authority Number (NAAN) to identify who is the registered naming authority. The Smithsonian also has registered datasetIDs (or shoulders) so that EZID passes the web traffic to a specific Smithsonian collection system.
vk7 in the ARK above is registered to metadata records in the Smithsonian American Art Museum (SAAM) collection management system. If vk7 were replaced with bj9 the call would change and go to the image delivery server for SAAM.
Each Smithsonian collection system is configured to receive the web call from EZID, read the datasetID, and direct the call to the correct server for metadata records or multimedia.
vc9 resolves to the Cooper Hewitt image server https://collection.cooperhewitt.org/ark/vc9
ye3 resolves to the Freer Sackler metadata server https://collections.si.edu/search/record/ark:/65665/ye3
jy5 resolves to the Freer Sackler image server https://ids.si.edu/ids/deliveryService?id=ark:/65665/jy5
Using EZID, I register each Smithsonian collection system with our NAAN AND a datasetID with a URL to where the datasetID should resolve.
The Smithsonian wrote a datasetID schema which I follow when I create and register new collection systems:
Two randomly selected lowercase letters (no lowercase L, rm, nm, or fu)
+
One randomly selected number (2-9)
Image from the website: https://ezid.cdlib.org/
Each Smithsonian collection system is now configured to automatically generate an ARK when a metadata or multimedia record is saved. The ARK includes the SI NAAN and the datasetID assigned to the collection system.
Challenges for the ARK implementation included…
The datasetID needs to be included in the URL: https://collections.si.edu/search/record/ark:/65665/ye3
Phase II of Open Access
Image from the website: https://n2t.net/e/ark_ids.html
SI media server
National Postal Museum SI TMS
AHM media server
CH media server
NH media server
National Museum of African Art SI TMS
American History Museum Mimsy XG
Cooper Hewitt Smithsonian Design Museum TMS
Natural History Museum
EMU
Plus 12 more systems …
Commercial resolver: n2t.net
Thank you!
ARKs at the Smithsonian Institution
Bess Missell
Metadata Librarian
Smithsonian Libraries
missellb@si.edu
ARKs in the Portico Archive
April 23rd 2020
Karen Hanson, Senior Research Developer
Overview
Portico workflow
Files checked, normalized, and packaged to prepare for preservation
Batch of files received e.g. PDF and XML version of articles in a journal issue
Resulting “archival units” deposited into archive. Each unit = e.g. 1 article
Portico workflow
Files checked, normalized, and packaged to prepare for preservation
Batch of files received e.g. PDF and XML version of articles in a journal issue
Resulting “archival units” deposited into archive. Each unit = e.g. 1 article
Archival unit content structure
Archival Unit
Content Units
Functional Units
Storage Units
Article A
Article A: Version 1
Article A: Version 2
Marked up full text
Page images rendition
Figure graphic component
Publisher supplied XML
Normalized XML (JATS)
JPEG
(high resolution)
PNG
(low resolution)
Structure described in metadata
Archival unit:
phc5qbrw2a.zip
Open BagIt “Bag”
Storage Units
Publisher supplied XML
Normalized XML (JATS)
JPEG
(high resolution)
PNG
(low resolution)
Structure described in metadata
Preservation
Metadata
Archival unit:
phc5qbrw2a.zip
Open BagIt “Bag”
Storage Units
Publisher supplied XML
Normalized XML (JATS)
JPEG
(high resolution)
PNG
(low resolution)
Archival unit content structure
Archival Unit
Content Units
Functional Units
Storage Units
Article A
Article A: Version 1
Article A: Version 2
Marked up full text
Page images rendition
Figure graphic component
Publisher supplied XML
Normalized XML (JATS)
JPEG
(high resolution)
PNG
(low resolution)
Use of ARKs supports a self describing archive
Full text XML with image references
ark:/12345/rmkd92kd
ark:/12345/rmkp3zr8
ark:/12345/rmk7fzqk
ark:/12345/rmk2kdjq
<fig id="fig1" position="float">
<label>Fig. 1</label>
<caption>
<p>Example figure!</p>
</caption>
<graphic
position="anchor"
xlink:href="ark:/12345/rmkp3zr8"
alt-version="no"
xlink:type="simple"/>
</fig>
references
references
references
What did we assign billions of ARKs to?
… over 2 billion ARKs
ARK resolver use case: Enhanced Monographs
EPUB Challenge: Remote Resources
remote resource
visually embedded or linked
Problem of external content embedded in EPUBs
Problem of external content embedded in EPUBs
What if we could resolve an ARK to the video?
https://ids.portico.org/ark:/12345/rmkrq29x8
Thank you!
karen.hanson@ithaka.org
Thanks also to my colleague Amy Kirchhoff for helping me put together this presentation.
ARK Identifiers In Genealogy
FamilySearch International
Presented at CNI, Spring 2020
N. Thomas Creighton
tc@familysearch.org
FamilySearch International - A Brief Introduction
www.familysearch.org
Artifact Processing Abstraction
Searching For Ancestors
https://www.familysearch.org/ark:/61903/1:1:K98H-2G2
Maudie M. Creighton --
.../ark:/61903/1:1:K98H-2GL
David M. Creighton --
.../ark:/61903/1:1:K98H-2GG
Robert T. Creighton --
.../ark:/61903/1:1:K98H-2GP
Thomas Percy Creighton Details --
.../ark:/61903/4:1:25V8-3J5
Census page with context --
.../ark:/61903/3:1:3QSQ-G9MT-N9ZF?personaUrl=%2Fark%3A%2F61903%2F1%3A1%3AK98H-2G2
ark:/61903/3:1:3QSQ-G9MT-N9ZF
?i=35&personaUrl=%2Fark%3A%2F61903%2F1%3A1%3AK98H-2G2
https://www.familysearch.org/ark:/61903/3:1:3QSQ-G9MT-N9ZF
https://www.familysearch.org/ark:/61903/1:1:K98H-2G2 (Thomas Percy Creighton) as json
A small snippet of 3438 lines:
Organization and Volume of Minting
Namespace or Name Assigning Authority | Description | Approximate Count In Millions | Annual Increase In Millions |
1:1 | Historical record persona | 8800 | 1,511 |
1:2 | Historical record | 5300 | 452.27 |
2 (1-3) | Pedigree data | 1500 | 73.05 |
3 (1-4) | Digital images of documents | 4300 | 344.64 |
4 | FamilyTree person records | 1400 | 43.2 |
| | | |
Total: | | 21300 | 2,424 |
Managing Access and Routing
FamilySearch International
www.familysearch.org
N. Thomas Creighton
tc@familysearch.org