1 of 30

The sixth W3C TREE CG meeting

2023-09-27 (Teams)

2 of 30

Agenda

  1. Introductions
  2. Issue 71: the member extraction algorithm
  3. Any other business� issue prioritization� the next call

3 of 30

Member Extraction

Current proposal: using CBD + shape hints

https://github.com/pietercolpaert/extract-cbd-shape

4 of 30

What’s a tree:Member?

A tree:Member is a Set of quads.

The triples that are part of the set is defined by �“the member extraction algorithm” (see further).

tree:member refers to the focus node (term borrowed from SHACL) of a member that can be used to extract the member.

An ID for the tree:Member itself can be created based on the collection IRI and this focus node’s IRI

URIs of your entities

tree:Collection

tree:member

...

5 of 30

Member extraction algorithm

The algorithm that extracts all triples describing an entity from a set of triples (such as an RDF page, or a message on a pubsub channel), and potentially does HTTP requests to fetch more triples according to well-defined triggers.

The algorithm MUST always return the same set of triples across implementations

extractor = new Extractor(shape, dereferencer);

//This function may do HTTP requests to retrieve out of band quads

extractor.extractMember(windowQuads, entityIRI);

6 of 30

Example 1: if member extraction doesn’t return anything, do an HTTP request to the entity if this wasn’t done before

This appears in issue #77

<> a tree:Collection;

tree:member

<metasequoia-disticha>, # doesn’t have more quads here, so it should be dereferenced

<metasequoia-foxii>,

<metasequoia-glyptostroboides>;

tree:view <?limit=10>.

<?limit=10> a tree:Node;

tree:relation [ a tree:GreaterThanRelation;

tree:path ex:ultimateHeightInMeters ;

tree:node <?limit=10&offset=10>;

tree:qualifiedValue <pinus-alepensis>

].

7 of 30

Example 2: Concise Bounded Description (CBD)�= extract triples with that subject, and their blank nodes recursively

<> a tree:Collection;

tree:member <metasequoia-disticha> .

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily [ ex:name "Sequoioideae" ] , <something-else> .

<something-else> ex:shouldntbe "included" .

8 of 30

Example 3: Extract triples in a named graph

Mind that the set of member quads also includes the CBD quads

<> a tree:Collection;

tree:member <metasequoia-disticha-v1> .

<metasequoia-disticha-v1> dcterms:created "2012-05-02T12:00" .

<metasequoia-disticha-v1> {

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily [ ex:name "Sequoioideae" ] .

}

9 of 30

Example 3: Extract triples in a named graph

But mind that we don’t apply example 4 here

Mind that the set of member quads also includes the CBD quads

<> a tree:Collection;

tree:member <metasequoia-disticha-v0>, <metasequoia-disticha-v1> .

<metasequoia-disticha-v0> dcterms:created "2012-05-02T11:00" .

<metasequoia-disticha-v1> dcterms:created "2012-05-02T12:00" .

<metasequoia-disticha-v1> {

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:otherThing <metasequoia-disticha-v0>, <metasequoia-disticha-v1> ;

ex:subFamily [ ex:name "Sequoioideae" ] .

}

10 of 30

Example 4: Extract more than CBD by taking well-defined hints from a shape

<> a tree:Collection;

tree:member <metasequoia-disticha> ;

tree:shape <Shape.ttl#Family> .

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily <Sequoioideae> .

<Sequoioideae> ex:name "Sequoioideae" .

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

],[

sh:path ex:subFamily ;

sh:node [

sh:property [

sh:path ex:name ;

sh:minCount 1

]

]

] .

The NodeShape indicates the quads of this NamedNode is included in this member

11 of 30

Shape template algorithm

Shape {

closed: boolean, // If set to true, don’t apply CBD on the focus node

requiredPaths: Path[], // Can trigger an HTTP request if not set

optionalPaths: Path[], // Also include the deeper down paths, if they are set

nodelinks: NodeLink[], // If this path is set, we need to re-do the algorithm on that named node with the shape linked in the nodelink.

atLeastOneLists: [ Shape[] ] // The shapes, if they are ok wrt required paths, may also trigger an HTTP request in the deeper down nodelinks

}

NodeLink {

shape: Shape,

path: Path

}

Relies heavily on SHACL property paths for deeper objects

12 of 30

Example 4.1 Open vs. closed shapes

This means the member will only have a ex:name property

<Shape.ttl#Family> a sh:NodeShape ;

sh:closed true ;

sh:property [

sh:path ex:name ;

] .

13 of 30

Example 4.x - Multiple NodeShape

This means the member will only have a ex:name property

Decision: We’ll go with this nodeshape solution first, if it doesn’t seem to fit certain scenarios, we will re-open the discussion and possibly extend the member extraction algorithm using shapes.

<Shape.ttl#Anything> a sh:NodeShape ;

sh:or (

<Shape.ttl#Dataset>

<Shape.ttl#Distribution>

<Shape.ttl#DataService>

) .

14 of 30

Example 4.2 Cardinality design choice

We don’t check maxCount

We wouldn’t know which one to choose otherwise in the extraction if there were more

We only check > 0, not the exact minCount

We only trigger an HTTP request to fetch the current focus node if there are none set. If there are insufficient properties set, we assume the member is invalid.

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

], [

sh:path ex:subFamily ;

sh:minCount 2 ;

sh:maxCount 3 ;

sh:node [

sh:property [

sh:path ex:name ;

]

]

] .

15 of 30

Example 4.3 OR example

The subfamily can be another resource with required properties, or it can be a literal value, or it can be both.

If at least one of the items is set, it’s not doing an HTTP request

It however extracts all the items in the list that are otherwise valid.

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

];

sh:or ( [

sh:path ex:subFamily ;

sh:node [

sh:property [

sh:path ex:name ;

sh:minCount 1

]

]

]

[

sh:path ex:subFamily ;

sh:datatype xsd:string

]

).

16 of 30

Example 4.4 XONE example

The subfamily can be another resource with required properties, or it can be a literal value, but not both.

If at least one of the items is set, it’s not doing an HTTP request

It however extracts all the items in the list that are otherwise valid

⇒ we wouldn’t know otherwise which one to pick

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

];

sh:xone ( [

sh:path ex:subFamily ;

sh:node [

sh:property [

sh:path ex:name ;

sh:minCount 1

]

]

]

[

sh:path ex:subFamily ;

sh:datatype xsd:string

]

).

17 of 30

Example 4.5 Paths

Paths are processed, and the member can thus include complex SHACL property paths.

Example points at the name of the parent family through an inverse property and a sequence path.

The triples needed to reach the path’s goal are included in the member.

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ([sh:inversePath ex:subFamily ] ex:name) ;

sh:minCount 1

];

18 of 30

SHACL to Shape Templates

NodeShapes are processed:

  • If there’s a sh:property
    • check for sh:minCount > 0 → add to the required paths, else to the optional paths
    • check for sh:node → add to nodelinks
  • Conditionals:
    • Both sh:or and sh:xone are processed in the atLeastOneLists array. � Why? Because if none are set, we are certain data is lacking for both sh:or as sh:xone. If 1 or more is set, it may only be possible that data isn’t valid, but since we don’t fully validate data, we have to accept this.
    • sh:and embeds the linked NodeShape or PropertyShape as part of the current shape

19 of 30

Shape template extraction algorithm

First focus node = tree:member object

  1. Start from focus node. If shape isn’t closed, apply CBD.
  2. If this is a named node and it wasn’t requested before:
    1. test if all required properties are set, if not do an HTTP request, if yes, ↓
    2. test if at least one of each list in the atLeastOneLists was set. If not, do an HTTP request.
  3. Visit all paths (required, optional, nodelinks and recursively the shapes in the atLeastOneLists if their required paths are set) paths and add all quads necessary to reach the targets to the result
  4. For the results of nodelinks, if the target is a named node, set it as a focus node and repeat this algorithm with that nodelink’s shape as a shape

!! We don’t support doing an HTTP request based on incomplete paths, it’s up to the designer of the shape to clearly indicate a nodelink

20 of 30

Extra example

<> a tree:Collection;

tree:member <metasequoia-disticha-v1> .

<metasequoia-disticha-v1> dcterms:created "2012-05-02T12:00" ;

dcterms:isVersionOf <metasequoia-disticha> .

<metasequoia-disticha-v1> {

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily [ ex:name "Sequoioideae" ] .

}

21 of 30

Next steps

Did we reach a consensus and shall we merge the PR?

https://github.com/TREEcg/specification/pull/78

22 of 30

Any other business

  1. Issue prioritization
    1. Proofreading spec
      1. Remove conditional imports?
      2. A thorough read-through session with everyone?
    2. Iterators
  2. Next community group call
    • 11th of October at 15:00 CET?

23 of 30

Additional slides on relation handling

24 of 30

Simple example from what already exists out there

Double linked list, every tree:Node is a possible entrypoint with a �“next” and/or “previous” page link

This is really annoying though: the fact the TREE spec allows back-links means we MUST keep state (or some kind of bookmark with a traversal direction – see further), and this state may become really big if the view has many nodes.

The benefit however is that you can enter through any node, and still find all members if you want that. This may be useful in combination with search forms (currently unsupported in the LDES client).

N1

R1.1

… Ni

Ri.1

Nn

Ri.2

Rn.1

25 of 30

A relation

Expresses one condition for a client to jump from one node to another node (this is important as the relation is contextual to the current node)

The client must prune relations to nodes it already visited.

(in order to ensure it always is a search tree in the eyes of the client)

Multiple relations to the same node can be set, and must be processed together (logical AND).

(This is able power more interesting search trees, such as B-Trees which document an interval for every link to another node)

26 of 30

The search tree

While from a macro-perspective, �the information architecture looks like a graph

N1

N2

N3

R3.1

R3.2

R1.1

R1.2

R2.1

27 of 30

The search tree (this slide has animations)

The client has to prune relations to nodes it already visited

N1

N2

N3

R3.1

R3.2

R1.1

R1.2

R2.1

28 of 30

Conceptually, each deeper down link means an AND with all previous relations

Important for reachability when designing TREE structures, for the client this will be implicitly true as the client will otherwise have pruned this subtree

N1

N2

N3

R1.2

R2.1

Members in N3, from the client’s perspective, will adhere to R1.2 AND R2.1

29 of 30

Multiple relations to the same node

⇒ MUST be combined with a logical AND

N1

N2

R1.2

R1.3

<R1.2> a tree:GreaterThanRelation ;

tree:node <N2> ;

tree:value 5 .

<R1.3> a tree:LessThanRelation ;

tree:node <N2> ;

tree:value 10 .

<10

>5

A client will visit N2 (and the nodes linked from N2) when it is interested in members between 5 and 10

30 of 30

Caveat: design the relations not towards the next node’s members, but towards all members reachable from that node

N1

N2

R1.2

R1.3

<10

>5

N3

R2.1

N3 will only be visited/reachable by the client, when the client is interested in members between 5 and 10, regardless of what R2.1 says