1 of 19

The fourth W3C TREE CG meeting

2023-08-16

2 of 19

Agenda

  • Introductions
  • Issue 71: the member extraction algorithm
  • Any other business� issue prioritization, � the next call

3 of 19

Someone new?

4 of 19

Member Extraction

Current proposal – using CBD + shape hints

https://github.com/pietercolpaert/extract-cbd-shape

5 of 19

What’s a tree:Member?

A tree:Member is a Set of triples.

The triples that are part of the set is defined by “the member extraction algorithm” (see further).

tree:member refers to the primary topic of a member that can be used to extract the member. This is not an ID for the tree:Member itself.

URIs of your entities

tree:Collection

tree:member

...

6 of 19

Member extraction algorithm

The algorithm that extracts all triples describing an entity from a set of triples (such as an RDF page, or a message on a pubsub channel), and potentially does HTTP requests to fetch more triples according to well-defined criteria.

The algorithm MUST always return the same set of triples

It can also be seen as a SPARQL DESCRIBE query that may fetch more data when relevant https://github.com/w3c/sparql-dev/issues/39

7 of 19

Design considerations

One may

  • want to have a tree:Collection with out of band members or parts of the members that are out of band
  • not want to have a full shape defined (tree:shape is optional), and instead just support star shaped data
  • want to use named graphs for members

One must have the same triples extracted across different implementations

The shape and data at hand come from the same source, so we can consider that the data given ought to be valid

8 of 19

Feature 1: if member extraction doesn’t return anything, do an HTTP request to the entity if this wasn’t done before

Use case ISSUE 77: no triples of the member embedded in the page https://github.com/TREEcg/specification/issues/77

<> a tree:Collection;

tree:member

<metasequoia-disticha>,

<metasequoia-foxii>,

<metasequoia-glyptostroboides>;

tree:view <?limit=10>.

<?limit=10> a tree:Node;

tree:relation [ a tree:GreaterThanRelation;

tree:path ex:ultimateHeightInMeters ;

tree:node <?limit=10&offset=10>;

tree:qualifiedValue <pinus-alepensis>

].

9 of 19

Feature 2: CBD - extract triples with that subject, and their blank nodes recursively

<> a tree:Collection;

tree:member <metasequoia-disticha> .

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily [ ex:name "Sequoioideae" ] .

10 of 19

Feature 3: Extract more based on shape, and for every sh:node, extract CBD + shape again

But what aspects of SHACL do we take into account?

<> a tree:Collection;

tree:member <metasequoia-disticha> ;

tree:shape <Shape.ttl#Family> .

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily <Sequoioideae> .

<Sequoioideae> ex:name "Sequoioideae" .

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

],[

sh:path ex:subFamily ;

sh:node [

sh:property [

sh:path ex:name ;

sh:minCount 1

]

]

] .

11 of 19

Shape Fragments?

Extracts all sets of triples that validate a SHACL shape from a list of triples

Heavy, as it also makes sure the shape is valid… In our case: we can trust the data publisher the data will be valid, and maybe we are even interested in invalid data that we can validate and/or fix later?

⇒ so let’s find something else we can do

https://github.com/Shape-Fragments/old-shapefragments-paper/blob/main/fullpaper.pdf

12 of 19

New proposal: SHACL discovery templates

  • Extract all nodelinks at certain paths
  • Extract all required paths
    • ⇒ if a required path has not been set, do an HTTP request

Slight difficulty: conditionals

  • sh:xone → process in order until the first one has been found, process these nodelinks and required properties
  • sh:and → concatenate nodelinks and required properties
  • sh:or → same as before, because sh:and is the worst case of sh:or, and this is the only way we can always make sure we extract the same triples.

13 of 19

Example:

a SHACL discovery template with a required path and a nodeLink at a certain path to ex:3. There’s also a xone that needs to be validated against the current set of quads

Shape {

requiredPaths: [

Path { pathItems: [

PredicatePath {value: "ex:1"},

PredicatePath {value: "ex:2"}] } ],

nodeLinks: [ NodeLink {pathPattern: …, link: "ex:3"} ],

xone: [ {Shape} ]

}

14 of 19

What HTTP request should be done?

The one of the current focus node? Or one on a broken path?

⇒ proposal: only do a HTTP request based on a base focusnode, not on a broken path

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

],[

sh:path ex:subFamily ;

sh:node <Shape.ttl#Family> .

] .

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

],[

sh:path [ sh:oneOrMorePath (ex:subFamily ex:name) ] ;

sh:minCount 0 .

] .

15 of 19

What HTTP request should be done?

The one of the current focus node? Or one on a broken path?

⇒ proposal: only do a HTTP request based on a base focusnode, not on a broken path

<> a tree:Collection;

tree:member <metasequoia-disticha> ;

tree:shape <Shape.ttl#Family> .

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily <Sequoioideae> .

#<Sequoioideae> ex:name "Sequoioideae" .

<Shape.ttl#Family> a sh:NodeShape ;

sh:property [

sh:path ex:name ;

sh:minCount 1

],[

sh:path [ sh:oneOrMorePath (ex:subFamily ex:name) ] ;

sh:minCount 1 .

] .

16 of 19

Feature 4: Extract triples in a named graph

Makes versioning members a lot easier, but a strict limitation is that a graph name MUST NOT be used in another member

<> a tree:Collection;

tree:member <metasequoia-disticha-v1> .

<metasequoia-disticha-v1> {

<metasequoia-disticha> ex:name "Metasequoia Disticha" ;

ex:ultimateHeightInMeters "12" ;

ex:subFamily ex:name "Sequoioideae" ] .

}

17 of 19

Demo

See test cases in https://github.com/pietercolpaert/extract-cbd-shape

See README.md for more explanation about the implementation and design

This implementation was made to learn about all the design options, and does not focus on performance.

18 of 19

Next steps

If this is the way to go:

  • Adapt PR78 to formalize this instead
  • Finish implementation in https://github.com/pietercolpaert/extract-cbd-shape and move to the TREECG organization

19 of 19

Any other business

  • Issue prioritization
  • Next community group call