1 of 131

Interacting with Standards

Hands-on Fedora

Esmé Cowles, Yinlin Chen, and Mike Durbin

http://bit.ly/c4l18-fedora

2 of 131

Getting Started

Esmé Cowles

3 of 131

Introductions

  • Esmé Cowles, Princeton University Library
  • Yinlin Chen, Virginia Tech Libraries
  • Mike Durbin, University of Virginia Library

4 of 131

Schedule

9:30

10:00

Getting Started

10:00

10:10

Fedora API Specification

10:10

11:00

Linked Data Platform

11:00

11:20

Break

11:20

11:30

Fixity

11:30

11:40

Versioning

11:40

12:00

Activity Streams

12:00

12:20

Web Access Control

5 of 131

Environment Setup

https://github.com/fcrepo4-exts/fcrepo4-vagrant/archive/fcrepo4-vagrant-4.7.5-RC-2.zip

$ unzip fcrepo4-vagrant-4.7.5-RC-2.zip�$ cd fcrepo4-vagrant-fcrepo4-vagrant-4.7.5-RC-2�$ vagrant up

Pre-requisites

6 of 131

Vagrant Environment

  • Vagrant Services
  • Fedora User Accounts
    • testuser, password: password1
    • adminuser, password: password2
    • fedoraAdmin, password: secret3 (admin account)

7 of 131

Fedora

API Specification

Esmé Cowles

8 of 131

Why?

  • Conflicting priorities
  • API stability
  • Snowflakes
  • Standards

9 of 131

Standards, Standards, Standards

  • Linked Data Platform (CRUD)
  • HTTP (Fixity)
  • Memento (Versioning)
  • Activity Streams (Events)
  • Web Access Control (Auth)
  • REST/HATEOAS

10 of 131

Status

  • Candidate Recommendation
    • https://fedora.info/spec/
    • Time for comments
    • Two implementations
  • Implementations

11 of 131

Linked Data Platform

Yinlin Chen

ylchen@vt.edu

Virginia Tech Libraries

12 of 131

Agenda

  • Introduction to Linked Data
  • Linked Data Platform and Fedora
  • Fedora resource management (CRUD)
  • Hands-on: Fedora CRUD
    • Fedora HTML UI
    • Fedora HTTP API

13 of 131

Learning Outcomes

  • Learn linked data terminology
  • Learn linked data practices
  • Learn linked data platform
  • Familiarity with Linked Data in Fedora
  • Familiarity with Fedora resource management

14 of 131

Linked Data Overview

  • Helps to dismantle silos
  • Human and machine readable
  • Graph based representation of data

“...is a method of publishing structured data so that it can be interlinked and become more useful through semantic queriesThis enables data from different sources to be connected and queried.

- https://en.wikipedia.org/wiki/Linked_data

15 of 131

Resource Description Framework - RDF

  • Structure of an RDF statement (a.k.a triple):

  • <subject> is an entity
  • <predicate> represents a relationship between subject and object
  • <object> of a statement may be an entity or literal (string)

<subject> <predicate> <object>

16 of 131

“2003”

“How to Train Your Dragon”�

“Cressida Cowell (born 15 April 1966) is an English children's author, popularly known for the novel series, How to Train Your Dragon…”

Data from http://dbpedia.org/page/How_to_Train_Your_Dragon

http://dbpedia.org/resource/

Category:Fictional_Vikings

http://dbpedia.org/resource/Cressida_Cowell

http://dbpedia.org/resource/

How_to_Train_Your_Dragon

genre

hasLabel

datePublished

hasAuthor

hasAbstract

Subject

Predicate

Object

17 of 131

Ontologies

Ontologies are formal specifications of shared conceptualizations

Well-known ontologies:

  • RDF, RDFS, Dublin Core, FOAF, ORE, schema.org, SKOS, EDM, OWL

Fedora4 community ontology:

  • PCDM (Samvera + Islandora)

Less well-known ontologies:

  • BIBFRAME, Fedora, W.I.P MODS/MADS, Darwincore-SW

18 of 131

Vocabularies

Controlled list of terms, each with a URI.

Building blocks that you can use to make an ontology or describe data with.

Well known vocabularies:

  • Library of Congress: Subject Headings, Names, MARC Relators
  • Virtual International Authority File (VIAF)
  • Getty vocabularies (AAT, ULAN, TGN)
  • GeoNames
  • DBpedia

19 of 131

Linked Data Serializations

  • Serializations of RDF graphs:

  • Prefixes / Namespaces

N-Triples:

<localhost:8080/fcrepo/rest/path/to/resource> <http://purl.org/dc/elements/1.1/title> “The Sloth” .

Turtle:

@prefix dc:<http://purl.org/dc/elements/1.1/> .

<localhost:8080/fcrepo/rest/path/to/resource> dc:title “The Sloth” .

  • N3 (Notation 3)
  • Turtle

  • N-Triples
  • RDF/XML

  • JSON-LD
  • RDFa

20 of 131

rdf:type

  • rdf:type defines what type of “thing” you are describing
  • "a" is a synonym for rdf:type: <dbpedia.org/page/Dragon> a owl:Thing
  • A resource can have many types:

21 of 131

Turtle Serialization

@prefix dbr:<http://dbpedia.org/resource/> .

@prefix dbp:<http://dbpedia.org/property/> .

@prefix dbc:<http://dbpedia.org/resource/Category:> .

@prefix dbo:<http://dbpedia.org/ontology/> .

@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .

@prefix xsd:<http://www.w3.org/2001/XMLSchema#> .

@prefix dcterms:<http://purl.org/dc/terms/> .

dbr:How_to_Train_Your_Dragon rdfs:label “How To Train Your Dragon”@en, “Le eroiche disavventure di Topicco Terribilis Totanus III”@it ;

dbp:author dbr:Cressida_Cowell ;

dbp:pubDate "2003"^^xsd:integer ;

dcterms:subject dbc:Fictional_Vikings .

dbr:Cressida_Cowell a yago:author ;

dbo:abstract “Cressida Cowell (born 15 April 1966) is an English children's author, popularly known for

the novel series, How to Train Your Dragon”@en .

http://dbpedia.org/data/How_to_Train_Your_Dragon.n3

22 of 131

N-Triples Serialization

<http://dbpedia.org/resource/How_to_Train_Your_Dragon> <http://www.w3.org/2000/01/rdf-schema#label> “How To Train Your Dragon”@en .

<http://dbpedia.org/resource/How_to_Train_Your_Dragon> <http://www.w3.org/2000/01/rdf-schema#label> "Le eroiche disavventure di Topicco Terribilis Totanus III"@it .

<http://dbpedia.org/resource/How_to_Train_Your_Dragon> <http://dbpedia.org/property/author> <http://dbpedia.org/resource/Cressida_Cowell> .

<http://dbpedia.org/resource/How_to_Train_Your_Dragon> <http://dbpedia.org/property/pubDate> "2003"^^<http://www.w3.org/2001/XMLSchema#integer> .

<http://dbpedia.org/resource/How_to_Train_Your_Dragon> <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Fictional_Vikings> .

<http://dbpedia.org/resource/Cressida_Cowell> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/author> .

<http://dbpedia.org/resource/Cressida_Cowell> <http://dbpedia.org/ontology/abstract> “Cressida Cowell (born 15 April 1966) is an English children's author, popularly known for the novel series, How to Train Your Dragon...”@en .

23 of 131

RDF/XML Serialization

<?xml version="1.0" encoding="utf-8" ?>

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:dct="http://purl.org/dc/terms/"

xmlns:dbp="http://dbpedia.org/property/">

<rdf:Description rdf:about="http://dbpedia.org/resource/How_to_Train_Your_Dragon">

<rdfs:label xml:lang="en">How to Train Your Dragon</rdfs:label>

<rdfs:label xml:lang="it">Le eroiche disavventure di Topicco Terribilis Totanus III</rdfs:label>

<dct:subject rdf:resource="http://dbpedia.org/resource/Category:Fictional_Vikings" />

<dbp:author rdf:resource="http://dbpedia.org/resource/Cressida_Cowell" />

<dbp:pubDate rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2003</dbp:pubDate>

</rdf:Description>

</rdf:RDF>

24 of 131

RDF Blank Nodes (bnodes)

  • Anonymous resource - they do not have/need their own URI

  • Some systems skolemize blank nodes so that they have a name
  • Can make interactions more complex
  • Make comparisons of RDF graphs complicated

<> dcterms:extent [ rdf:value “1 volume (420 pages): photographs” ]

<> dcterms:extent _:b .

_:b rdf:value “1 volume (420 pages): photographs” .

25 of 131

Linked Data “Rules”

  • Use URIs as names for things.
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  • Include links to other URIs, so that they can discover more things.

26 of 131

Linked Data Recommendations

  • Don’t reinvent the wheel. Use well known ontologies and/or vocabularies.
  • Keep resource description as atomic as possible. Hierarchy is given by an RDF graph, not by a single resource.
  • Avoid blank RDF nodes
  • Domain-specific and local supplements

27 of 131

Linked Data Platform (LDP)

  • Describes a way to interact with linked data resources: “find resources and follow links, publish new resources, edit and delete existing ones” (LDP-Primer)
  • W3C standard; builds upon other existing standards
  • Everything is a web resource with a URI
  • Provides a terminology for describing relevant concepts
    • RDFSource - resources have properties whose state can be represented as RDF
    • NonRDFSource - resource whose state cannot be represented as RDF (a binary)
    • Container - resource which organizes/collects other resources (like a directory)
  • Resources can contain other resources (containers) or files (binaries)

28 of 131

Linked Data Platform (LDP)

  • Provides basic level of interoperability for reading/writing linked data on web
    • Minimally required RDF serializations
    • Advertise & Discover capabilities of a resource (HTTP methods accepted, etc)
    • Given a binary resource, how do we discover a description of it?
  • Codifies best practices (e.g. use of ETags, descriptions of constraints)
  • Allows flexible implementation strategy
    • Writes not required; can make a read-only repository
    • Server-managed triples

29 of 131

Linked Data Platform and Fedora

  • Support the creation and management of LDP Containers
  • LDP-R: A Linked Data Platform Resource as defined in LDP
    • LDP RDF Source (LDP-RS)
    • LDP Non-RDF Source (LDP-NR)
  • Support HTTP request methods to perform CRUD operations
    • HTTP GET (Read operation)
    • HTTP HEAD (Read operation)
    • HTTP POST (Create operation)
    • HTTP PUT (Update/Create operation)
    • HTTP PATCH (Update operation)
    • HTTP DELETE (Delete operation)

30 of 131

Applied CRUD

Our primary way of managing Fedora resources

  • Interact with resources via:
    • HTML Interface for browsers (abridged/simplified)
      • Basic/default UI; not necessarily user-facing
    • HTTP Client (most comprehensive)
      • curl is popular on the command line
      • Every relevant programming language has some sort of http client library

31 of 131

Hands-on: Fedora CRUD

  • Part 1: CRUD operations via Fedora HTML UI

  • Part 2: CRUD operations via Fedora HTTP API
    • Curl command

32 of 131

CRUD via HTML UI

Fundamental operations

33 of 131

Available Operations via HTML UI

  • GET//HEAD/OPTIONS (Retrieval)
  • POST/PUT (Creation)
  • PATCH (Update)
  • DELETE (Removal)

34 of 131

Startup

In a browser navigate to http://localhost:8080/fcrepo/rest

35 of 131

HTML Interface Cheatsheet

PATCH

Slug

{

POST

GET

DELETE

Resource URI

36 of 131

Step 1a: RDF Resource Creation (POST)�

  1. Go to http://localhost:8080/fcrepo/rest �(root node)
  2. In “Type” select field choose container (default)
  3. In “Identifier” text field enter basic
  4. Press “add” button

This will create a new RDF Resource (LDP Basic Container) and redirect us to our next slide!

37 of 131

Step 1b: RDF Resource Creation (POST)�

  • You will be redirected to http://localhost:8080/fcrepo/rest/basic
  • In “Type” select field choose container (default)
  • In “Identifier” text field enter collection
  • Press “add” button

This will create a new RDF Resource (LDP Basic Container) and redirect us to our next slide.

38 of 131

Step 1c: RDF Resource Creation (POST)�

This will create a new RDF Resource (LDP Basic Container) and redirect us to our next slide!

39 of 131

Step 2: Resource Retrieval (GET) �

  • Every time you got redirected after creating a Container you were using GET.
  • Retrieval is accessed directly via the LDP Path that defines a resource and contains user and some server managed RDF triples.

40 of 131

Binaries

We’ll use an image for (most) examples of binaries

Download from: https://goo.gl/Nfrv7s

41 of 131

Step 3: Binary Resource Creation (POST)

This will create a new Binary Resource (LDP Non RDF Source) and redirect us to our next slide!

42 of 131

Step 4: Binary Resource Retrieval (GET)�

  • You will be redirected to http://localhost:8080/fcrepo/rest/basic/images/loc.jpg/fcr:metadata
  • Notice the fcr:metadata part!
    1. Image content is at “loc.jpg”
    2. Its metadata (rdf properties you can manipulate) in a virtual subpath named /fcr:metadata
    3. The fcr:metadata part is an implementation detail, but the fact that the RDF that describes a binary has a URI and can be linked to is important.

Why? LDP says that an server can create an additional descriptive resource containing RDF describing binaries

43 of 131

Step 5: Update RDF Properties (PATCH)

DELETE {}�INSERT { <> ebucore:width "100"}�WHERE {}

c. Press “Update

44 of 131

Our updated RDF Properties from step 5.

45 of 131

Last step: Delete a resource (DELETE)

What do you see?

46 of 131

Departed

Fedora 4 Creates tombstone resources at “original/path/fcr:tombstone” URL, in this case

“basic/images/loc.jpg/fcr:tombstone” �(try that last path in your Browser)

So, to recreate a resource at that same PATH you need to delete the tombstone placeholder first and that can not be done via HTML UI

Discovered tombstone resource at /basic/images/loc.jpg, departed: 2018-01-21T15:44:01.373Z

47 of 131

CRUD via CURL

Interacting with Fedora HTTP API

48 of 131

Quick reminder on how to use curl

$ curl -X METHOD -u user:password -v -i -H “headername: headervalue --data-binary “@filenameURL

  • -X one of POST/PUT/GET/(all CRUD) HTTP request methods
  • -u server authentication user:password pair
  • -v verbose
  • -i show headers
  • -I do a HEAD request
  • -H header key:value pairs to sent (Header names are Case Sensitive!)
  • --data-binary sets body to binary content of file named filename.
  • -d sets body to ascii content of file (using @) or inline data
  • URL Resource URL (endpoint)
  • -L follow redirects

49 of 131

GET: Containers

Controlling Response Serialization via Request Header

  • Accept: content-negotiation for different RDF variants
    • application/ld+json
    • application/n-triples
    • application/rdf+xml
    • text/n3 (or text/rdf+n3)
    • text/plain
    • text/turtle

$ curl -i -H "Accept:text/turtle" -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest

50 of 131

HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Cache-Control: private

Expires: Thu, 01 Jan 1970 00:00:00 UTC

Link: <http://www.w3.org/ns/ldp#Resource>;rel="type"

Link: <http://www.w3.org/ns/ldp#Container>;rel="type"

Link: <http://www.w3.org/ns/ldp#BasicContainer>;rel="type"

Accept-Patch: application/sparql-update

Accept-Post: text/turtle,text/rdf+n3,text/n3,application/rdf+xml,application/n-triples,application/ld+json,multipart/form-data,application/sparql-update

Allow: MOVE,COPY,DELETE,POST,HEAD,GET,PUT,PATCH,OPTIONS

Preference-Applied: return=representation

Vary: Prefer

Vary: Accept, Range, Accept-Encoding, Accept-Language

Content-Type: text/turtle;charset=utf-8

Content-Length: 1410

Date: Thu, 18 Jan 2018 18:48:28 GMT

@prefix premis: <http://www.loc.gov/premis/rdf/v1#> .

@prefix test: <info:fedora/test/> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix xsi: <http://www.w3.org/2001/XMLSchema-instance> .

@prefix xmlns: <http://www.w3.org/2000/xmlns/> .

...

Response Headers

Response

Body

51 of 131

...

@prefix premis: <http://www.loc.gov/premis/rdf/v1#> .

@prefix test: <info:fedora/test/> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix xsi: <http://www.w3.org/2001/XMLSchema-instance> .

@prefix xmlns: <http://www.w3.org/2000/xmlns/> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix fedora: <http://fedora.info/definitions/v4/repository#> .

@prefix xml: <http://www.w3.org/XML/1998/namespace> .

@prefix ebucore: <http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#> .

@prefix ldp: <http://www.w3.org/ns/ldp#> .

@prefix xs: <http://www.w3.org/2001/XMLSchema> .

@prefix fedoraconfig: <http://fedora.info/definitions/v4/config#> .

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix authz: <http://fedora.info/definitions/v4/authorization#> .

@prefix dc: <http://purl.org/dc/elements/1.1/> .

<http://localhost:8080/fcrepo/rest/>

rdf:type ldp:RDFSource ;

rdf:type ldp:Container ;

rdf:type ldp:BasicContainer ;

fedora:writable "true"^^<http://www.w3.org/2001/XMLSchema#boolean> ;

rdf:type fedora:RepositoryRoot ;

rdf:type fedora:Resource ;

rdf:type fedora:Container ;

fedora:hasTransactionProvider <http://localhost:8080/fcrepo/rest/fcr:tx> .

Namespace

Prefixes

52 of 131

POST Containers/Binaries

Controlling Resource Creation via Request Header

  • Slug: foo (only POST)
    • Asks Fedora, if possible, to use foo as subpath for new resource
    • If not used or conflicts, server will generate arbitrary URI
  • Content-Type
    • Any non-RDF type will create a Binary Resource

$ curl -i -X POST -ufedoraAdmin:secret3 -H "Slug:abc" -H "Content-Type:text/plain" -d "abc" http://localhost:8080/fcrepo/rest

53 of 131

curl -i -X POST -u fedoraAdmin:secret3 -H "Slug:abc"

-H "Content-Type:text/plain" -d "abc" http://localhost:8080/fcrepo/rest

HTTP/1.1 201 Created

Server: Apache-Coyote/1.1

ETag: "08cd3a9db9060a18d78e9522747330cff32fbcc2"

Last-Modified: Thu, 18 Jan 2018 18:56:11 GMT

Link: <http://localhost:8080/fcrepo/rest/abc/fcr:metadata>; rel="describedby"; anchor="http://localhost:8080/fcrepo/rest/abc"

Location: http://localhost:8080/fcrepo/rest/abc

Content-Type: text/plain

Content-Length: 37

Date: Thu, 18 Jan 2018 18:56:11 GMT

http://localhost:8080/fcrepo/rest/abc

This is the final Path of your New resource

It is a binary, so rdf description is here

Always check your response Codes

54 of 131

$ curl -i -X POST -u fedoraAdmin:secret3 -H "Slug:abc" -H "Content-Type:text/plain" -d "abc" http://localhost:8080/fcrepo/rest

HTTP/1.1 201 Created

Server: Apache-Coyote/1.1

ETag: "89c26230b3703581d021d20fb8ce014fc85a9fd3"

Last-Modified: Thu, 18 Jan 2018 18:56:17 GMT

Link: <http://localhost:8080/fcrepo/rest/2a/79/ee/0c/2a79ee0c-593e-4289-8e10-3272084942e9/fcr:metadata>; rel="describedby"; anchor="http://localhost:8080/fcrepo/rest/2a/79/ee/0c/2a79ee0c-593e-4289-8e10-3272084942e9"

Location: http://localhost:8080/fcrepo/rest/2a/79/ee/0c/2a79ee0c-593e-4289-8e10-3272084942e9

Content-Type: text/plain

Content-Length: 82

Date: Thu, 18 Jan 2018 18:56:17 GMT

http://localhost:8080/fcrepo/rest/2a/79/ee/0c/2a79ee0c-593e-4289-8e10-3272084942e9

Run the same curl command again

Slug can’t be respected, so F4 uses Internal PID Minter

55 of 131

GET: Binaries

  • Discover the binary’s description by doing HEAD, then looking for describedby link header

  • The describedby header is defined by LDP, so that resource metadata can be discovered by clients
    • Link: <http://localhost:8080/fcrepo/rest/abc/fcr:metadata>; rel="describedby"
  • Follow that link

$ curl -I -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/abc

$ curl -i -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/abc/fcr:metadata

56 of 131

PUT Containers/Binaries

Similar to POST but:

  • The used URL is the final resource path (no Slug!)
  • If Resource exists at given Path, it is replaced.
    • If replacing RDF. First GET the resource. Add/modify triples. Then PUT back.
    • <> == <http://localhost:8080/fcrepo/rest/demoresource> == <demoresource>

$ curl -i -XPUT -ufedoraAdmin:secret3 -H"Content-Type:text/turtle" -d 'PREFIX dc: <http://purl.org/dc/elements/1.1/> <> dc:title "Demo Resource"' http://localhost:8080/fcrepo/rest/demoresource

57 of 131

PUT: local Binaries

# This will download a binary image to the current directory

$ curl -L -o loc.jpg https://goo.gl/Nfrv7s

# This will PUT the binary into Fedora

$ curl -i -X PUT -u fedoraAdmin:secret3 -H "Content-Type: image/jpeg" --data-binary "@loc.jpg" http://localhost:8080/fcrepo/rest/curl/loc.jpg

58 of 131

PATCH: Containers (RDF)

  • Updates an RDF resource
  • SPARQL Update (https://www.w3.org/TR/sparql11-update/)
    • Content-Type: application/sparql-update
    • Insert | delete data {...} for simple operations
    • DELETE { } INSERT { } WHERE { } for more complex graph triples matching

59 of 131

PATCH

  • Using SPARQL Update

$ curl -i -XPATCH -H "Content-type:application/sparql-update" -ufedoraAdmin:secret3 -d "INSERT DATA {<> <http://purl.org/dc/elements/1.1/title> 'Library of Congress'}" http://localhost:8080/fcrepo/rest/curl/loc.jpg/fcr:metadata

$ curl -i -XPATCH -H "Content-type:application/sparql-update" -ufedoraAdmin:secret3 -d "DELETE {<> <http://purl.org/dc/elements/1.1/title> ?o} INSERT {<> <http://purl.org/dc/elements/1.1/title> 'Great Hall at the Library of Congress'} WHERE {<> <http://purl.org/dc/elements/1.1/title> ?o}" http://localhost:8080/fcrepo/rest/curl/loc.jpg/fcr:metadata

It’s easier/cleaner to build your sparql-update commands in an external file (e.g. myupdate.sparql), then pass as --data-binary “@myupdate.sparql”

60 of 131

DELETE

  • Removes a resource, but leaves a tombstone

$ curl -i -XDELETE -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/curl/loc.jpg

HTTP/1.1 204 No Content

Date: Fri, 19 Jan 2018 20:03:15 GMT

HTTP/1.1 410 Gone

Server: Apache-Coyote/1.1

Cache-Control: private

Expires: Thu, 01 Jan 1970 00:00:00 UTC

Link: <http://localhost:8080/fcrepo/rest/curl/loc.jpg/fcr:tombstone>; rel="hasTombstone"

Content-Type: text/plain

Content-Length: 82

Date: Fri, 19 Jan 2018 20:04:11 GMT

$ curl -i -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/curl/loc.jpg

61 of 131

DELETE, for real!

  • Let’s say we wanted to PUT the image back:

  • Cannot overwrite tombstones, but we can delete them!

$ curl -i -X PUT -ufedoraAdmin:secret3 -H "Content-Type: image/jpeg" --data-binary "@loc.jpg" http://localhost:8080/fcrepo/rest/curl/loc.jpg

HTTP/1.1 410 Gone

...

Link: <http://localhost:8080/fcrepo/rest/curl/loc.jpg/fcr:tombstone>; rel="hasTombstone"

...

Date: Fri, 19 Jan 2018 20:06:06 GMT

Discovered tombstone resource at /curl/loc.jpg, departed: 2018-01-19T19:53:52.768Z

$ curl -i -XDELETE -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/curl/loc.jpg/fcr:tombstone

62 of 131

DELETE, for real!

  • Now let’s GET the resource

  • Now it’s 404 (not found) vs 410 (“gone”)
  • Let’s PUT again

$ curl -i -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/curl/loc.jpg

HTTP/1.1 404

$ curl -i -X PUT -ufedoraAdmin:secret3 -H "Content-Type: image/jpeg" --data-binary "@loc.jpg" http://localhost:8080/fcrepo/rest/curl/loc.jpg

HTTP/1.1 201

63 of 131

Summary

  • The UI is a quick and easy way to read/write to the repository
    • .. but it hides a lot from you
  • CURL is flexible and comprehensive
    • View/set headers
    • Examine entire responses
    • Can do things that the UI can’t, like DELETE tombstones

For more information:

https://wiki.duraspace.org/x/fAsCB

64 of 131

Break

Resume at 11:20

65 of 131

Fixity

Esmé Cowles

66 of 131

Fixity

  • Making sure bits don't change unexpectedly
  • Caveat: It's not always that easy

67 of 131

Two Kinds of Fixity

Transmission Fixity

  • Network corruption
  • Dropped connection
  • Pre-ingest copying errors

Persistence Fixity

  • Bit rot
  • Disk failure
  • Migration copying errors
  • Unauthorized modification

68 of 131

Fixity via CURL

Interacting with HTTP

69 of 131

Creating a sample data file

    • You can use any data file you like
    • But:
      • It should be at least 4KB
      • We're going to use the SHA-1 checksum

$ head -c 8192 /dev/zero > file1.dat

$ shasum file1.dat

0631457264ff7f8d5fb1edc2c0211992a67c73e6 file1.dat

70 of 131

Transmission Fixity Failure

  • Uploading a file using PUT
  • Specifying a checksum with the Digest header
  • Using incorrect value to demonstrate failure

$ curl -i -u fedoraAdmin:secret3 -X PUT --data-binary @file1.dat \� -H "Digest: sha1=bad" http://localhost:8080/fcrepo/rest/file1

71 of 131

Transmission Fixity Failure

HTTP/1.1 409 Conflict

Server: Apache-Coyote/1.1

Cache-Control: private

Expires: Thu, 01 Jan 1970 00:00:00 UTC

Content-Type: text/plain;charset=utf-8

Content-Length: 88

Date: Sat, 13 Jan 2018 12:36:11 GMT

Checksum Mismatch of urn:sha1:bad and urn:sha1:0631457264ff7f8d5fb1edc2c0211992a67c73e6

72 of 131

Transmission Fixity Success

  • Uploading a file using PUT
  • Specifying a checksum with the Digest header
  • Using the correct checksum

$ curl -i -u fedoraAdmin:secret3 -X PUT --data-binary @file1.dat \� -H "Digest: sha1=0631457264ff7f8d5fb1edc2c0211992a67c73e6" \� http://localhost:8080/fcrepo/rest/file1

73 of 131

Transmission Fixity Success

HTTP/1.1 201 Created

Server: Apache-Coyote/1.1

Cache-Control: private

Expires: Thu, 01 Jan 1970 00:00:00 UTC

ETag: "4a2cd452fee08d0b7dabb20c093daf154b822840"

Last-Modified: Sat, 13 Jan 2018 12:36:36 GMT

Link: <http://localhost:8080/fcrepo/rest/file1/fcr:metadata>; rel="describedby"; anchor="http://localhost:8080/fcrepo/rest/file1"

Location: http://localhost:8080/fcrepo/rest/file1

Content-Type: text/plain

Content-Length: 39

Date: Sat, 13 Jan 2018 12:36:36 GMT

http://localhost:8080/fcrepo/rest/file1

74 of 131

Persistence Fixity Success

  • Validating that our file hasn't changed

$ curl -i -u fedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/file1/fcr:fixity

75 of 131

Persistence Fixity Success

<http://localhost:8080/fcrepo/rest/file1>

premis:hasFixity <http://localhost:8080/fcrepo/rest/file1#fixity/1515847016030> .

<http://localhost:8080/fcrepo/rest/file1#fixity/1515847016030>

rdf:type premis:Fixity ;

rdf:type premis:EventOutcomeDetail ;

premis:hasEventOutcome "SUCCESS" ;

premis:hasMessageDigestAlgorithm "SHA-1"^^ ;

premis:hasMessageDigest <urn:sha1:0631457264ff7f8d5fb1edc2c0211992a67c73e6> ;

premis:hasSize "8192" .

76 of 131

Extra Credit: Persistence Fixity Failure

  • Modify the file on disk in the Vagrant environment
  • Validate that the file has been corrupted

$ vagrant ssh�vagrant$ sudo -s�root$ echo "corruption" > /var/lib/tomcat7/fcrepo4-data/fcrepo.binary.directory/06/31/45/0631457264ff7f8d5fb1edc2c0211992a67c73e6

$ curl -i -u fedoraAdmin:secret3 \� http://localhost:8080/fcrepo/rest/file1/fcr:fixity

77 of 131

Extra Credit: Persistence Fixity Failure (headers)

HTTP/1.1 200 OK

Server: Apache-Coyote/1.1

Cache-Control: private

Expires: Thu, 01 Jan 1970 00:00:00 UTC

Link: <http://www.w3.org/ns/ldp#Resource>; rel="type"

Link: <http://www.w3.org/ns/ldp#RDFSource>; rel="type"

Content-Type: text/turtle;charset=utf-8

Content-Length: 1601

Date: Sat, 13 Jan 2018 12:41:15 GMT

78 of 131

Extra Credit: Persistence Fixity Failure (body)

<http://localhost:8080/fcrepo/rest/file1>

premis:hasFixity <http://localhost:8080/fcrepo/rest/file1#fixity/1515847275733> .

<http://localhost:8080/fcrepo/rest/file1#fixity/1515847275733>

rdf:type premis:Fixity ;

rdf:type premis:EventOutcomeDetail ;

premis:hasEventOutcome "BAD_CHECKSUM" ;

premis:hasEventOutcome "BAD_SIZE" ;

premis:hasMessageDigestAlgorithm "SHA-1" ;

premis:hasMessageDigest <urn:sha1:6a7bb2556144babe3899b25e5428123735bb1e27> ;

premis:hasSize "11" .

79 of 131

Fixity: API Specification

  • Transmission Fixity
    • No change
  • Persistence Fixity
    • Use Want-Digest header to request checksum calculation
    • Client needs to verify the checksum

$ curl -I -u fedoraAdmin:secret3 -H "Want-Digest: sha1" \� http://localhost:8080/fcrepo/rest/file1

80 of 131

Versioning

Esmé Cowles

81 of 131

Versioning via CURL

Interacting with HTTP

82 of 131

Creating a Version

  • Create a version of the current state of a file or container

$ curl -i -u fedoraAdmin:secret3 -X POST -H "Slug: v1" \� http://localhost:8080/fcrepo/rest/file1/fcr:versions

83 of 131

Updating the resource

  • Change the current state of the resource

$ curl -i -u fedoraAdmin:secret3 -X PUT -d "updated stuff" \� http://localhost:8080/fcrepo/rest/file1

84 of 131

Discover versions

<http://localhost:8080/fcrepo/rest/file1>

fedora:hasVersion <http://localhost:8080/fcrepo/rest/file1/fcr:versions/v1> .

<http://localhost:8080/fcrepo/rest/file1/fcr:versions/v1>

fedora:hasVersionLabel "v1"^^<http://www.w3.org/2001/XMLSchema#string> ;

fedora:created "2018-01-13T12:20:44.95Z" .

$ curl -i -u fedoraAdmin:secret3 \� http://localhost:8080/fcrepo/rest/file1/fcr:versions

85 of 131

Retrieve a previous version

  • Retrieve the version of the resource labeled "v1"

$ curl -i -u fedoraAdmin:secret3 \� http://localhost:8080/fcrepo/rest/file1/fcr:versions/v1

86 of 131

Memento

  • HTTP Framework for Time-Based Access to Resource States (RFC 7089)
    • Existing standard for exposing versions
    • Fedora API Specification extends Memento for creating versions
  • Memento entities
    • Original Resource: the resource being versioned
    • Time Gate: negotiates for previous versions
    • Memento: a version of a resource
    • Time Map: information about available versions
  • Client-managed vs. server-managed versioning

87 of 131

Memento in Fedora

  • GET/HEAD on a resource
    • Link rel="timegate" referencing the Time Gate (same as the resource)
    • Link rel="timemap" referencing the Time Map (version container)
  • GET the Time Gate with an Accept-Datetime header to retrieve a version
  • GET the Time Map to discover versions and their timestamps
    • Must support application/link-format, may support other formats

  • POST to Time Map to create a version (if POST allowed)
    • Emtpy POST: creates a version of the current state
    • POST with Memento-Datetime header and body: creates an historical version

88 of 131

Activity Streams

Notifications in Fedora

Mike Durbin

89 of 131

What are Activity Streams?

Fundamental concepts

90 of 131

Notification Events in Fedora

“For every resource whose state is changed as a result of an HTTP operation, there MUST be a corresponding notification made available describing that change.”

  • Notifications are represented as prescribed in the activitystreams-core specification
  • Notifications must:
    • Identify the resource changed
    • Indicate the event type

91 of 131

Terminology Clarification

Throughout the history of this project, the following terms are used to describe the same aspect of the repository software:

  • Messaging
  • Activity Streams
  • Notifications

92 of 131

Roles and Responsibilities

  • Repository
    • Do CRUD well. Once a change to a repository resource is durable, broadcast a message describing what happened to it.
  • Messaging Framework
    • Accept messages from the repository
    • Deliver messages to clients in a timely fashion
  • Client
    • Accept messages delivered from the messaging framework
    • Determine if the message is relevant to client’s business
    • Act upon it somehow.

93 of 131

Key characteristics

  • From repository’s perspective “fire and forget”
  • Messaging framework responsible for delivery guarantees (choice and configuration of messaging software)
    • Durability of messages
    • Timeliness of message
    • Ordering of messages
    • Enqueueing messages
  • Clients operate asynchronously
    • Message arrives some time after event occurred
    • Can operate at their own pace without affecting other clients, or the repository

94 of 131

Messaging Clients

  • Software/service external to the repository
    • A standalone service, or part of a standalone service
  • Intentionally subscribe to a “channel” (a topic or queue) that contains messages emitted by the desired source (e.g. Fedora)
  • Inspect message for relevance (What sort of action occurred? On what resource? When? By whom?)
  • Perform some (potentially long-running) action
    • Stereotype: indexing

95 of 131

Messages

  • Headers + body (much like an HTTP response)
  • Body: unconstrained. Fedora uses JSON-LD messages.
    • Know what kind/format of messages you’re going to get before you subscribe
  • Anatomy of a message from Fedora:
    • Resource URI
    • rdf:type of resource
    • Parent resource URI
    • Type of event (C, U, D)
    • Time event occurred
    • User
  • Notably and intentionally absent: Content of resource

96 of 131

When does Fedora 4 emit messages?

When Events happen that are related to durable changes to your resources in Fedora 4 (CUD)

    • CREATE
    • UPDATE
    • DELETE

97 of 131

Messaging Patterns: Topics

  • Publish/subscribe
    • One sender (Fedora) publishes to a topic. By default, it’s named “fedora”.
    • Any client subscribed to “fedora” gets their own copy of each message.
    • Different subscribers do different things with messages (e.g. auditing, indexing, logging)
      • Each subscription to a topic has a different purpose
    • Durable subscriptions: store messages if a subscribed client goes offline, resume when it gets online

98 of 131

Messaging Patterns: Queues

  • Message storing and Load Balancing
    • Sender (Fedora) publishes to a queue, clients remove message from queue
    • Only one copy of each message on queue. No two clients get the same message
    • Different subscribers do the same thing to each message on the queue.
      • Each queue has a different purpose. Each client of a queue has the same purpose
    • Inherently durable. Each message remains stored in queue until processed.

99 of 131

Messaging technologies

  • Broker (e.g. ActiveMQ) - routes messages between client and server, manages subscriptions/queues, persists messages when necessary
  • Protocol (e.g. OpenWire, AMQP, STOMP) - Defines how messages are transferred and acknowledged “over the wire”
  • Client API (e.g. JMS) - Provides methods for subscribing, sending, and receiving messages within an application
  • Apache Camel - Java-based integration platform for processing and routing messages within an application. Makes a good client library too.

100 of 131

Deployment Considerations

  • Fedora 4 has a built-in ActiveMQ broker enabled by default
    • Publishes messages to a topic called “fedora”
    • If no subscribers, messages are lost
  • In production, you’d want send messages to an external broker
  • Use queues if you don’t want to lose messages if there are no subscribers
    • Or, if using ActiveMQ, “virtual destinations”, which behave like queues
  • To configure activemq in Fedora
    • Create an activemq.xml config file
    • Use system property fcrepo.activemq.configuration to point to its location
      • E.g. -Dfcrepo.activemq.configuration=file:/path/to/activemq.xml

101 of 131

Beyond CRUD

  • You’re likely to have topics and queues for Fedora events
  • You’re equally likely to have topics and queues for other messages!
    • Lots more to process than CRUD events to repository
    • General-purpose work queues supporting other workflows
  • Common non-CRUD use-case: reindexing
    • Iterate the resources in the repository, and place them into a queue
    • Apply a processing step (e.g. indexing) to all the resources in the reindexing queue
    • Fcrepo-camel-toolbox has general-purpose tool to iterate the repository, and populate specified queue!

102 of 131

Let’s try!

Download a utility to view messages

vagrant ssh

sudo wget -O /usr/local/bin/fcr-listen \

https://github.com/birkland/fcr-listen/releases/download/0.0.1/fcr-listen-Linux-x86_64�sudo chmod +x /usr/local/bin/fcr-listen

fcr-listen

103 of 131

Let’s try!

We will inspect the messages emitted by Fedora

  • Add a resource to Fedora.
    • In your browser, go to http://localhost:8080/fcrepo/rest and add a container named ‘msg’
  • Now look at your terminal.
    • Can you see bodies and headers?
    • How many messages were emitted? Why?

104 of 131

Let’s try! (continued)

Let’s inspect the JSON message body

  • Do crtl+c to exit the listener
  • Install jq (an application that can manipulate JSON)
  • Pipe the JSON body through jq for pretty printing

sudo apt-get install jq

fcr-listen | grep '{' | jq .

105 of 131

Let’s try! (continued)

Let’s inspect the JSON message body

  • Add another resource to Fedora
    • In your browser, go to http://localhost:8080/fcrepo/rest/msg and add a container named ‘another’ (note: you might have to create two resources before the first messages is displayed)
  • Look at the pretty-printed JSON. Can you get a sense of what it contains?

106 of 131

Example: Message Headers

expires 0

org.fcrepo.jms.identifier = /msg

org.fcrepo.jms.user = fedoraAdmin

org.fcrepo.jms.resourceType = http://www.w3.org/ns/ldp#Container,http://fedora.info/definitions/v4/repository#Resource,http://fedora.info/definitions/v4/repository#Container,http://www.w3.org/ns/ldp#RDFSource

destination = /topic/fedora

ack = ID:fedora4-55273-1516219368164-25:1

org.fcrepo.jms.eventType = http://fedora.info/definitions/v4/event#ResourceModification

subscription = 1

priority = 4

org.fcrepo.jms.baseURL = http://localhost:8080/fcrepo/rest

org.fcrepo.jms.eventID = urn:uuid:5fbcc81a-7b1b-42fc-8f31-4561a8b52a94

org.fcrepo.jms.timestamp = 1516223977540

message-id = ID:fedora4-55273-1516219368164-4:1:1:1:43

persistent = true

org.fcrepo.jms.userAgent = Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0

timestamp = 1516223977556

107 of 131

Example: JSON-LD

{

"id": "http://localhost:8080/fcrepo/rest/msg",

"type": ["http://www.w3.org/ns/ldp#Container", "http://fedora.info/definitions/v4/repository#Resource"],

"isPartOf": "http://localhost:8080/fcrepo/rest",

"wasGeneratedBy": {

"atTime": "2018-01-17T21:13:36.851Z",

"identifier": "urn:uuid:9f4ec7d7-3676-4154-a3fe-c3b7b2486860",

"type": [

"http://fedora.info/definitions/v4/event#ResourceModification",

"http://www.w3.org/ns/prov#Activity"

]

}

"wasAttributedTo": [

{

"name": "fedoraAdmin",

"type": "http://www.w3.org/ns/prov#Person"

},

{

"name": "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0",

"type": "http://www.w3.org/ns/prov#SoftwareAgent"

}

],

}

108 of 131

Sample Clients

Serialization service writes all repository content to disk. Let’s see what it wrote.

Indexing service updates a triplestore or solr index in response to changes.

cat /opt/karaf/etc/org.fcrepo.camel.serialization.cfg | grep descriptions

ls /tmp/descriptions/fcrepo/rest

109 of 131

Activity Stream Links

  • Fedora Spec�https://fcrepo.github.io/fcrepo-specification/#notifications
  • Fedora Camel Tooling�https://github.com/fcrepo4-exts/fcrepo-camel-toolbox
  • fcr-listen (thanks Aaron Birkland!)�https://github.com/birkland/fcr-listen
  • fcrepo-camel�https://github.com/fcrepo4-exts/fcrepo-camel-toolbox

110 of 131

Web Access Control

(WebAC)

Mike Durbin

111 of 131

Authentication and Authorization

AuthN: verification of who you are

verifying a username/password

verifying a token

AuthZ: verification of what you can do

can the client access this resource?

can the client perform this operation?

112 of 131

Fedora does Authorization

113 of 131

WebAC Authorization

  • RDF-based

  • Handles most use cases

  • Solid WebAC LDP applications

114 of 131

Solid Web AC Specification

115 of 131

Hands On Web AC

Scenario:

We’ve got some images that can only be shared with the “fedoraadmin” user.

116 of 131

Hands On WebAC

Create the following containers:

  • “acls” ...at the top-level
  • “acl” ...contained inside “acls”
  • “authorization” ...contained inside “acl”
  • “files” ...contained inside “basic/images”

117 of 131

Final result (structure)

  • basic/
    • images/
      • files/

  • acls/
    • acl/
      • authorization/

118 of 131

Final result (structure)

  • basic/
    • images/
      • files/

  • acls/
    • acl/
      • authorization

“images” must point to its ACL

  • An ACL must have one or more authorizations
  • “authorizations” define:
  • agent(s)
  • mode(s)
  • resource(s) or class

acl:accessControl

119 of 131

Define the “acl” as a webac:Acl

  1. Navigate to your acl: http://localhost:8080/fcrepo/rest/acls/acl
  2. Update the RDF to make it a “webac:Acl”��PREFIX webac: <http://fedora.info/definitions/v4/webac#>�INSERT {� <> a webac:Acl .�} WHERE { }

120 of 131

Define the “authorization”

  1. Navigate to the resource http://localhost:8080/fcrepo/rest/acls/acl/authorization
  2. Update the properties��PREFIX acl: <http://www.w3.org/ns/auth/acl#>�INSERT {� <> a acl:Authorization ;� acl:accessTo </fcrepo/rest/basic/images> ; � acl:mode acl:Read, acl:Write;� acl:agent "adminuser" .�} WHERE { }

121 of 131

Link “acl” to “images” Resource

  1. Navigate to the “images” resource: http://localhost:8080/fcrepo/rest/basic/images
  2. Update the RDF��PREFIX acl: <http://www.w3.org/ns/auth/acl#>�INSERT {� <> acl:accessControl </fcrepo/rest/acls/acl>�} WHERE { }

122 of 131

List Preconfigured Users

Log into the vagrant VM:

vagrant ssh

View the users configured for tomcat

tail /etc/tomcat7/tomcat-users.xml

123 of 131

Preconfigured Users

<role rolename="fedoraUser"/>

<role rolename="fedoraAdmin"/>

<user username="testuser" password="password1" roles="fedoraUser"/>

<user username="adminuser" password="password2" roles="fedoraUser"/>

<user username="fedoraAdmin" password="secret3" roles="fedoraAdmin"/>

</tomcat-users>

124 of 131

Verify authZ (warning: cURL ahead)

curl -I http://localhost:8080/fcrepo/rest/basic/images

> 401

�curl -I -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/basic/images�> 200

curl -I -uadminuser:password2 http://localhost:8080/fcrepo/rest/basic/images�> 200

curl -I -utestuser:password1 http://localhost:8080/fcrepo/rest/basic/images�> 403

125 of 131

Applies to children, as well

curl -I http://localhost:8080/fcrepo/rest/basic/images/files

> 401

�curl -I -ufedoraAdmin:secret3 http://localhost:8080/fcrepo/rest/basic/images/files�> 200

curl -I -uadminuser:password2 http://localhost:8080/fcrepo/rest/basic/images/files�> 200

curl -I -utestuser:password1 http://localhost:8080/fcrepo/rest/basic/images/files�> 403

126 of 131

Authentication

NOT done by Fedora, but instead possibly done by:

  1. Servlet container (tomcat) as in previous examples
  2. Institutional single Sign-On
  3. Shibboleth

127 of 131

Shibboleth Scenario

The user, “testuser”, is a part of the “adminuser” group.

  • Shibboleth adds group attribute to request header

Configurable header: “some-header”

https://github.com/fcrepo4-exts/fcrepo-webapp-plus/blob/fcrepo-webapp-plus-4.7.3/src/webac/webapp/WEB-INF/classes/spring/auth-repo.xml#L21-L25

128 of 131

Shibboleth Scenario

Fedora Server

Web Server (enforcing Shibboleth Authentication)

Add’s header (stripls any user-header)

129 of 131

Verify AuthZ - Shibboleth Scenario

curl -I -uadminuser:password2 http://localhost:8080/fcrepo/rest/basic/images�> 200

curl -I -utestuser:password1 http://localhost:8080/fcrepo/rest/basic/images�> 403

curl -I -utestuser:password1 -H"some-header: adminuser" http://localhost:8080/fcrepo/rest/basic/images�> 200

130 of 131

More Web AC resources

  • Fedora Authorization Spec�https://fcrepo.github.io/fcrepo-specification/#resource-authorization
  • Fedora Wiki tutorials�https://wiki.duraspace.org/display/FEDORA4x/WebAC+Authorization+Delegatehttps://wiki.duraspace.org/display/FEDORA4x/Determining+the+Effective+Authorization+Using+WebAC

131 of 131

Contact Info

Yinlin Chen (ylchen@vt.edu)

Esmé Cowles (escowles@princeton.edu)

Mike Durbin (md5wz@virginia.edu)

Fedora Community Resources

http://groups.google.com/d/forum/fedora-tech

https://wiki.duraspace.org/display/FF/Mailing+Lists+etc