The sixth W3C TREE CG meeting
2023-09-27 (Teams)
Agenda
Member Extraction
Current proposal: using CBD + shape hints
What’s a tree:Member?
A tree:Member is a Set of quads.
The triples that are part of the set is defined by �“the member extraction algorithm” (see further).
tree:member refers to the focus node (term borrowed from SHACL) of a member that can be used to extract the member.
An ID for the tree:Member itself can be created based on the collection IRI and this focus node’s IRI
URIs of your entities
tree:Collection
tree:member
...
Member extraction algorithm
The algorithm that extracts all triples describing an entity from a set of triples (such as an RDF page, or a message on a pubsub channel), and potentially does HTTP requests to fetch more triples according to well-defined triggers.
The algorithm MUST always return the same set of triples across implementations
extractor = new Extractor(shape, dereferencer);
//This function may do HTTP requests to retrieve out of band quads
extractor.extractMember(windowQuads, entityIRI);
Example 1: if member extraction doesn’t return anything, do an HTTP request to the entity if this wasn’t done before
<> a tree:Collection;
tree:member
<metasequoia-disticha>, # doesn’t have more quads here, so it should be dereferenced
<metasequoia-foxii>,
<metasequoia-glyptostroboides>;
tree:view <?limit=10>.
<?limit=10> a tree:Node;
tree:relation [ a tree:GreaterThanRelation;
tree:path ex:ultimateHeightInMeters ;
tree:node <?limit=10&offset=10>;
tree:qualifiedValue <pinus-alepensis>
].
Example 2: Concise Bounded Description (CBD)�= extract triples with that subject, and their blank nodes recursively
<> a tree:Collection;
tree:member <metasequoia-disticha> .
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily [ ex:name "Sequoioideae" ] , <something-else> .
<something-else> ex:shouldntbe "included" .
Example 3: Extract triples in a named graph
Mind that the set of member quads also includes the CBD quads
<> a tree:Collection;
tree:member <metasequoia-disticha-v1> .
<metasequoia-disticha-v1> dcterms:created "2012-05-02T12:00" .
<metasequoia-disticha-v1> {
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily [ ex:name "Sequoioideae" ] .
}
Example 3: Extract triples in a named graph
But mind that we don’t apply example 4 here
Mind that the set of member quads also includes the CBD quads
<> a tree:Collection;
tree:member <metasequoia-disticha-v0>, <metasequoia-disticha-v1> .
<metasequoia-disticha-v0> dcterms:created "2012-05-02T11:00" .
<metasequoia-disticha-v1> dcterms:created "2012-05-02T12:00" .
<metasequoia-disticha-v1> {
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:otherThing <metasequoia-disticha-v0>, <metasequoia-disticha-v1> ;
ex:subFamily [ ex:name "Sequoioideae" ] .
}
Example 4: Extract more than CBD by taking well-defined hints from a shape
<> a tree:Collection;
tree:member <metasequoia-disticha> ;
tree:shape <Shape.ttl#Family> .
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily <Sequoioideae> .
<Sequoioideae> ex:name "Sequoioideae" .
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
],[
sh:path ex:subFamily ;
sh:node [
sh:property [
sh:path ex:name ;
sh:minCount 1
]
]
] .
The NodeShape indicates the quads of this NamedNode is included in this member
Shape template algorithm
Shape {
closed: boolean, // If set to true, don’t apply CBD on the focus node
requiredPaths: Path[], // Can trigger an HTTP request if not set
optionalPaths: Path[], // Also include the deeper down paths, if they are set
nodelinks: NodeLink[], // If this path is set, we need to re-do the algorithm on that named node with the shape linked in the nodelink.
atLeastOneLists: [ Shape[] ] // The shapes, if they are ok wrt required paths, may also trigger an HTTP request in the deeper down nodelinks
}
NodeLink {
shape: Shape,
path: Path
}
Relies heavily on SHACL property paths for deeper objects
Example 4.1 Open vs. closed shapes
This means the member will only have a ex:name property
<Shape.ttl#Family> a sh:NodeShape ;
sh:closed true ;
sh:property [
sh:path ex:name ;
] .
Example 4.x - Multiple NodeShape
This means the member will only have a ex:name property
Decision: We’ll go with this nodeshape solution first, if it doesn’t seem to fit certain scenarios, we will re-open the discussion and possibly extend the member extraction algorithm using shapes.
<Shape.ttl#Anything> a sh:NodeShape ;
sh:or (
<Shape.ttl#Dataset>
<Shape.ttl#Distribution>
<Shape.ttl#DataService>
…
) .
Example 4.2 Cardinality design choice
We don’t check maxCount
We wouldn’t know which one to choose otherwise in the extraction if there were more
We only check > 0, not the exact minCount
We only trigger an HTTP request to fetch the current focus node if there are none set. If there are insufficient properties set, we assume the member is invalid.
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
], [
sh:path ex:subFamily ;
sh:minCount 2 ;
sh:maxCount 3 ;
sh:node [
sh:property [
sh:path ex:name ;
]
]
] .
Example 4.3 OR example
The subfamily can be another resource with required properties, or it can be a literal value, or it can be both.
If at least one of the items is set, it’s not doing an HTTP request
It however extracts all the items in the list that are otherwise valid.
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
];
sh:or ( [
sh:path ex:subFamily ;
sh:node [
sh:property [
sh:path ex:name ;
sh:minCount 1
]
]
]
[
sh:path ex:subFamily ;
sh:datatype xsd:string
]
).
Example 4.4 XONE example
The subfamily can be another resource with required properties, or it can be a literal value, but not both.
If at least one of the items is set, it’s not doing an HTTP request
It however extracts all the items in the list that are otherwise valid
⇒ we wouldn’t know otherwise which one to pick
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
];
sh:xone ( [
sh:path ex:subFamily ;
sh:node [
sh:property [
sh:path ex:name ;
sh:minCount 1
]
]
]
[
sh:path ex:subFamily ;
sh:datatype xsd:string
]
).
Example 4.5 Paths
Paths are processed, and the member can thus include complex SHACL property paths.
Example points at the name of the parent family through an inverse property and a sequence path.
The triples needed to reach the path’s goal are included in the member.
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ([sh:inversePath ex:subFamily ] ex:name) ;
sh:minCount 1
];
SHACL to Shape Templates
NodeShapes are processed:
Shape template extraction algorithm
First focus node = tree:member object
!! We don’t support doing an HTTP request based on incomplete paths, it’s up to the designer of the shape to clearly indicate a nodelink
Extra example
<> a tree:Collection;
tree:member <metasequoia-disticha-v1> .
<metasequoia-disticha-v1> dcterms:created "2012-05-02T12:00" ;
dcterms:isVersionOf <metasequoia-disticha> .
<metasequoia-disticha-v1> {
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily [ ex:name "Sequoioideae" ] .
}
Next steps
Any other business
Additional slides on relation handling
Simple example from what already exists out there
Double linked list, every tree:Node is a possible entrypoint with a �“next” and/or “previous” page link
This is really annoying though: the fact the TREE spec allows back-links means we MUST keep state (or some kind of bookmark with a traversal direction – see further), and this state may become really big if the view has many nodes.
The benefit however is that you can enter through any node, and still find all members if you want that. This may be useful in combination with search forms (currently unsupported in the LDES client).
N1
R1.1
… Ni
Ri.1
Nn
Ri.2
Rn.1
A relation
Expresses one condition for a client to jump from one node to another node (this is important as the relation is contextual to the current node)
The client must prune relations to nodes it already visited.
(in order to ensure it always is a search tree in the eyes of the client)
Multiple relations to the same node can be set, and must be processed together (logical AND).
(This is able power more interesting search trees, such as B-Trees which document an interval for every link to another node)
The search tree
While from a macro-perspective, �the information architecture looks like a graph
N1
N2
N3
R3.1
R3.2
R1.1
R1.2
R2.1
The search tree (this slide has animations)
The client has to prune relations to nodes it already visited
N1
N2
N3
R3.1
R3.2
R1.1
R1.2
R2.1
Conceptually, each deeper down link means an AND with all previous relations
Important for reachability when designing TREE structures, for the client this will be implicitly true as the client will otherwise have pruned this subtree
N1
N2
N3
R1.2
R2.1
Members in N3, from the client’s perspective, will adhere to R1.2 AND R2.1
Multiple relations to the same node
⇒ MUST be combined with a logical AND
N1
N2
R1.2
R1.3
<R1.2> a tree:GreaterThanRelation ;
tree:node <N2> ;
tree:value 5 .
<R1.3> a tree:LessThanRelation ;
tree:node <N2> ;
tree:value 10 .
<10
>5
A client will visit N2 (and the nodes linked from N2) when it is interested in members between 5 and 10
Caveat: design the relations not towards the next node’s members, but towards all members reachable from that node
N1
N2
R1.2
R1.3
<10
>5
N3
R2.1
N3 will only be visited/reachable by the client, when the client is interested in members between 5 and 10, regardless of what R2.1 says