The fourth W3C TREE CG meeting
2023-08-16
Agenda
Someone new?
Member Extraction
Current proposal – using CBD + shape hints
What’s a tree:Member?
A tree:Member is a Set of triples.
The triples that are part of the set is defined by “the member extraction algorithm” (see further).
tree:member refers to the primary topic of a member that can be used to extract the member. This is not an ID for the tree:Member itself.
URIs of your entities
tree:Collection
tree:member
...
Member extraction algorithm
The algorithm that extracts all triples describing an entity from a set of triples (such as an RDF page, or a message on a pubsub channel), and potentially does HTTP requests to fetch more triples according to well-defined criteria.
The algorithm MUST always return the same set of triples
It can also be seen as a SPARQL DESCRIBE query that may fetch more data when relevant https://github.com/w3c/sparql-dev/issues/39
Design considerations
One may
One must have the same triples extracted across different implementations
The shape and data at hand come from the same source, so we can consider that the data given ought to be valid
Feature 1: if member extraction doesn’t return anything, do an HTTP request to the entity if this wasn’t done before
Use case ISSUE 77: no triples of the member embedded in the page https://github.com/TREEcg/specification/issues/77
<> a tree:Collection;
tree:member
<metasequoia-disticha>,
<metasequoia-foxii>,
<metasequoia-glyptostroboides>;
tree:view <?limit=10>.
<?limit=10> a tree:Node;
tree:relation [ a tree:GreaterThanRelation;
tree:path ex:ultimateHeightInMeters ;
tree:node <?limit=10&offset=10>;
tree:qualifiedValue <pinus-alepensis>
].
Feature 2: CBD - extract triples with that subject, and their blank nodes recursively
<> a tree:Collection;
tree:member <metasequoia-disticha> .
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily [ ex:name "Sequoioideae" ] .
Feature 3: Extract more based on shape, and for every sh:node, extract CBD + shape again
But what aspects of SHACL do we take into account?
<> a tree:Collection;
tree:member <metasequoia-disticha> ;
tree:shape <Shape.ttl#Family> .
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily <Sequoioideae> .
<Sequoioideae> ex:name "Sequoioideae" .
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
],[
sh:path ex:subFamily ;
sh:node [
sh:property [
sh:path ex:name ;
sh:minCount 1
]
]
] .
Shape Fragments?
Extracts all sets of triples that validate a SHACL shape from a list of triples
Heavy, as it also makes sure the shape is valid… In our case: we can trust the data publisher the data will be valid, and maybe we are even interested in invalid data that we can validate and/or fix later?
⇒ so let’s find something else we can do
https://github.com/Shape-Fragments/old-shapefragments-paper/blob/main/fullpaper.pdf
New proposal: SHACL discovery templates
Slight difficulty: conditionals
Example:
a SHACL discovery template with a required path and a nodeLink at a certain path to ex:3. There’s also a xone that needs to be validated against the current set of quads
Shape {
requiredPaths: [
Path { pathItems: [
PredicatePath {value: "ex:1"},
PredicatePath {value: "ex:2"}] } ],
nodeLinks: [ NodeLink {pathPattern: …, link: "ex:3"} ],
xone: [ {Shape} ]
}
What HTTP request should be done?
The one of the current focus node? Or one on a broken path?
⇒ proposal: only do a HTTP request based on a base focusnode, not on a broken path
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
],[
sh:path ex:subFamily ;
sh:node <Shape.ttl#Family> .
] .
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
],[
sh:path [ sh:oneOrMorePath (ex:subFamily ex:name) ] ;
sh:minCount 0 .
] .
What HTTP request should be done?
The one of the current focus node? Or one on a broken path?
⇒ proposal: only do a HTTP request based on a base focusnode, not on a broken path
<> a tree:Collection;
tree:member <metasequoia-disticha> ;
tree:shape <Shape.ttl#Family> .
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily <Sequoioideae> .
#<Sequoioideae> ex:name "Sequoioideae" .
<Shape.ttl#Family> a sh:NodeShape ;
sh:property [
sh:path ex:name ;
sh:minCount 1
],[
sh:path [ sh:oneOrMorePath (ex:subFamily ex:name) ] ;
sh:minCount 1 .
] .
Feature 4: Extract triples in a named graph
Makes versioning members a lot easier, but a strict limitation is that a graph name MUST NOT be used in another member
<> a tree:Collection;
tree:member <metasequoia-disticha-v1> .
<metasequoia-disticha-v1> {
<metasequoia-disticha> ex:name "Metasequoia Disticha" ;
ex:ultimateHeightInMeters "12" ;
ex:subFamily ex:name "Sequoioideae" ] .
}
Demo
See test cases in https://github.com/pietercolpaert/extract-cbd-shape
See README.md for more explanation about the implementation and design
This implementation was made to learn about all the design options, and does not focus on performance.
Next steps
If this is the way to go:
Any other business