1 of 76

Semantic Web, Web of Data, Linked Data, Knowledge Graphs

Hala Skaf-Molli

Hala.Skaf@univ-nantes.fr

http://pagesperso.ls2n.fr/~skaf-h

University of Nantes, LS2N, France

(Distributed Data Management) GDD Team

1

2 of 76

State of Art

2

Ontology

3 of 76

3

Plan

  • What is an ontology ?
  • Why we need an ontology ?
  • Building and reasoning with ontologies
  • Ontologies applications in biomedicals
  • Conclusion

4 of 76

4

Plan

  • What is an ontology ?
  • Why we need an ontology ?
  • Building and reasoning with ontologies
  • Ontologies applications in biomedicals
  • Conclusion

5 of 76

5

What is an ontology ?

  • An explicit specification of conceptualization, (Thomas Gruber, 1993 (inventor of Siri) (the original definition)

    • A conceptualization is the way we think about a domain
    • A specification provides a formal way of writing it down

6 of 76

6

What is an ontology ?

  • An ontology is defined by axioms in a formal language with the goal to provide unbiased (domain and application-independent) view of reality

  • Concretely, an ontology is a set of classes (or terms or concepts) with relations that operate between them

7 of 76

7

Gene Ontology (GO)

  • The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. (wikipedia)

  • GO defines types used to describe gene function
  • It classifies functions along three aspects:
    • molecular function
      • what gene products do
    • cellular component
      • where gene products operate
    • biological process
      • the pathways and processes that the gene products participate in

8 of 76

8

9 of 76

9

GO facilitates interoperability of function description across species

10 of 76

11 of 76

11

Find Needle in Haystack

12 of 76

13 of 76

14 of 76

15 of 76

15

Plan

  • What is an ontology ?
  • Why we need an ontology ?
  • Building and reasoning with ontologies
  • Ontologies applications in biomedicals
  • Conclusion

16 of 76

16

Why develop an ontology ?

  • To make domain assumptions explicit
  • To share common understanding of the entities in a given domain
  • To enable reuse of data and knowledge
  • To enable biomedical discovery

17 of 76

17

Making explicit domain assumptions

  • Hard-coding assumptions about the world in programming-language code makes these assumptions not only hard to find and understand but also hard to change, in particular for someone without programming expertise.
  • In addition, explicit specifications of domain knowledge are useful for new users who must learn what terms in the domain mean.

18 of 76

18

Sharing common understanding of the structure of information among people or software agents

  • Suppose several different Web sites contain medical information or provide medical e-commerce services.
    • If these Web sites share and publish the same underlying ontology of the terms they all use, then computer agents can extract and aggregate information from these different sites. (http://schema.org)
    • The agents can use this aggregated information to answer user queries or as input data to other applications.

19 of 76

19

Enabling reuse of domain knowledge

  • For example, models for many different domains need to represent the notion of time. This representation includes the notions of time intervals, points in time, relative measures of time, and so on.
    • If one group of researchers develops such an ontology in detail, others can simply reuse it for their domains.
    • If we need to build a large ontology, we can integrate several existing ontologies describing portions of the large domain.

20 of 76

20

Plan

  • What is an ontology ?
  • Why we need an ontology ?
  • Building and reasoning with ontologies
  • Ontologies applications in biomedicals
  • Conclusion

21 of 76

21

Formalization

  • Formal means:
    • Expressions are represented logically in machine readable format
    • Languages : RDFSchema, OWL (Ontology Web Language)
  • A class (or term or concept) can be understood as a set of similar entities
    • Example: Person, Man, Woman, Drug
  • A relation (or predicate or property)
    • has-mother, has-target

22 of 76

22

Ontology (or Knowledge Base or knowledge Graph) = Schema + Instance

  • Schema (TBox)
    • The set of class and relations
    • Constraints

  • Instance (ABox)
    • The set of facts : base facts with inferred facts

23 of 76

Entreprise Knowledge Graphs

23

Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, Jamie Taylor. Industry-Scale Knowledge Graphs: Lessons and Challenges Communications of the ACM, August 2019, Vol. 62 No. 8,

24 of 76

Open Knowledge Graphs

24

Lehmann, Jens, et al. "Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia." Semantic web 6.2 (2015): 167-195.

Malyshev, Stanislav, et al. "Getting the most out of wikidata: Semantic technology usage in wikipedia’s knowledge graph." International Semantic Web Conference. 2018.

DBpedia is a knowledge graph extracted from structured data in Wikipedia.

Wikidata is a collaboratively edited knowledge graph, operated by the Wikimedia foundation (hosting Wikipedia)

Instances/Entities

Assertions

Classes

Relations

DBpedia

5,044,223

854,294,312

760

1,355

CYC

122,441

2,229,266

116,821

116,821

Wikidata

52,252,549

732,420,508

2,356,259

6,236

NELL

5,120,688

60,594,443

1,187

440

25 of 76

25

Chemical Entities of Biological Interest (chebi) Ontology

Drug

Anti-Neoplastic Drug

subClassOf

Cardiovascular Drug

subClassOf

Anti-arrhythmia Drug

subClassOf

26 of 76

26

Drug

Anti-Neoplastic Drug

subClassOf

Cardiovascular Drug

subClassOf

Anti-arrhythmia Drug

subClassOf

Pharmaceutical

subClassOf

27 of 76

27

Drug

Anti-Neoplastic Drug

subClassOf

Cardiovascular Drug

subClassOf

Anti-arrhythmia Drug

subClassOf

Pharmaceutical

subClassOf

Schema

28 of 76

28

Example of Ontology: schema

Drug

Anti-Neoplastic Drug

subClassOf

has-target

inhibit

subPropertyOf

29 of 76

29

Example of Ontology: schema

Drug

Anti-Neoplastic Drug

subClassOf

has-target

inhibit

subPropertyOf

30 of 76

30

Add facts

Drug

Anti-Neoplastic Drug

subClassOf

has-target

inhibit

subPropertyOf

Gleevec

type

31 of 76

31

Ontology : schema + facts

Drug

Anti-Neoplastic Drug

subClassOf

Anti-Neoplastic-Drug subClassOf Drug .

inhibit subPropertyOf has-target .

Gleevec type Anti-Neoplastic .

has-target

inhibit

subPropertyOf

Gleevec

type

32 of 76

32

For reuse needs unique name

Drug

Anti-Neoplastic Drug

Gleevec

type

subClassOf

Anti-Neoplastic-Drug subClassOf Drug .

inhibit subPropertyOf has-target .

Gleevec type Anti-Neoplastic .

has-target

inhibit

subPropertyOf

33 of 76

33

Use URI

Anti-Neoplastic-Drug subClassOf Drug .

Gleevec type Anti-Neoplastic .

inhibit subPropertyOf has-target

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

34 of 76

34

Add labels for humans

Anti-Neoplastic-Drug subClassOf Drug .

Gleevec type Anti-Neoplastic .

inhibit subPropertyOf has-target

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

obo:CHEBI_23888 rdfs:label “Drug”.

obo:CHEBI_20000 rdfs:label “Anti-Neoplastic drug” .

35 of 76

35

Ontology in RDFS : subClassOf, label, subPropertyOf

Anti-Neoplastic-Drug subClassOf Drug .

Gleevec type Anti-Neoplastic .

inhibit subPropertyOf has-target

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

obo:CHEBI_23888 rdfs:label “drug”.

obo:CHEBI_20000 rdfs:label “Anti-Neoplastic drug” .

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

obo:inhibit rdfs:subPropertyOf obo:has-target .

36 of 76

36

Graph RDF Schema (RDFS) Classes

37 of 76

37

Relationships Between Core Classes and Properties

  • rdfs:subClassOf and rdfs:subPropertyOf are transitive, by definition
  • rdfs:Class is a subclass of rdfs:Resource
    • Because every class is a resource
  • rdfs:Resource is an instance of rdfs:Class
    • rdfs:Resource is the class of all resources, so it is a class
  • Every class is an instance of rdfs:Class

38 of 76

38

Example of Ontology: A taxonomy is a hierarchy of classes

39 of 76

40 of 76

40

Properties Graph

41 of 76

41

Domain & Range

42 of 76

42

SubProperties

43 of 76

43

Vocabularies of RDF Schema (RDFS)

Classes

Properties

Properties-constraints

rdfs:Resource

rdf:Property

rdfs:Class

rdf:type

rdfs:subClassOf

rdfs:subPropertyOf

rdfs:domain

rdfs:range

44 of 76

State of Art

44

Reasoning with ontologies

45 of 76

45

RDFS Logical Semantics

46 of 76

46

Semantics based on Inference Rules

  • Reasoning:
    • Deduce new facts based on existing ones
  • This inference system consists of inference rules of the form:
  • IF E contains certain triples THEN

add to E certain additional triples

where E is an arbitrary set of RDF triples

47 of 76

47

Semantics based on Inference Rules

  • RDFS specifies 44 entailment rules
  • The entailment rules are applied recursively until the graph does not change any more.
  • The result is called the deductive closure.

48 of 76

48

Automated reasoning allows to infer new knowledge

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

obo:CHEBI_23888 rdfs:label “drug”.

obo:CHEBI_20000 rdfs:label “Anti-Neoplastic drug” .

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

drugbank:DB00619 rdfs:label “Gleevec” .

49 of 76

49

Automated reasoning allows to infer new knowledge

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

obo:CHEBI_23888 rdfs:label “drug”.

obo:CHEBI_20000 rdfs:label “Anti-Neoplastic drug” .

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

drugbank:DB00619 rdfs:label “Gleevec” .

obo:CHEBI_20000

drugbank:DB00619

rdf:type

rdfs:subClassOf

“drug”

rdfs:label

“Anti-Neoplastic drug”

rdfs:label

“Gleevec”

rdf:label

50 of 76

50

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

obo:CHEBI_23888 rdfs:label “drug”.

obo:CHEBI_20000 rdfs:label “Anti-Neoplastic drug” .

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

drugbank:DB00619 rdfs:label “Gleevec” .

obo:CHEBI_20000

drugbank:DB00619

rdf:type

rdfs:subClassOf

“drug”

rdfs:label

rdfs:label

“Gleevec”

rdf:label

rdf:type

“Anti-Neoplastic drug”

Every instance is an instance of all more general classes

51 of 76

51

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

obo:CHEBI_20000 rdfs: subClassOf obo:CHEBI_23888 .

obo:CHEBI_23888 rdfs:label “drug”.

obo:CHEBI_20000 rdfs:label “Anti-Neoplastic drug” .

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

drugbank:DB00619 rdfs:label “Gleevec” .

drugbank:DB00619 rdf:type obo:CHEBI_23888 .

obo:CHEBI_20000

drugbank:DB00619

rdf:type

rdfs:subClassOf

“drug”

rdfs:label

“Anti-Neoplastic drug”

rdfs:label

“Gleevec”

rdf:label

rdf:type

Automated reasoning enriches the knowledge graph

52 of 76

52

obo:CHEBI_20000

drugbank:DB00619

rdf:type

rdfs:subClassOf

“drug”

rdfs:label

“Anti-Neoplastic drug”

rdfs:label

“Gleevec”

rdf:label

rdf:type

Automated reasoning enriches the knowledge graph

53 of 76

53

Drug

Cardiovascular Drug

subClassOf

Anti-arrhythmia Drug

subClassOf

Another example of automated reasoning

54 of 76

54

Drug

Cardiovascular Drug

subClassOf

Anti-arrhythmia Drug

subClassOf

subClassOf

Transitivity of subClassOf

55 of 76

55

@PREFIX drugbank: <http://bio2rdf.org/drugbank> .

@PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax/ns#> .

@PREFIX obo: <http://purl.obolibrary.org/obo/> .

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

obo:CHEBI_20000

drugbank:DB00619

rdf:type

Automated reasoning using type

56 of 76

56

drugbank:DB00619 rdf:type obo:CHEBI_20000 .

obo:CHEBI_20000 rdf: type rdfs:Class

obo:CHEBI_20000

drugbank:DB00619

rdf:type

Automated reasoning using type

rdfs:Class

rdf:type

57 of 76

57

Domain & Range Semantics

58 of 76

59 of 76

60 of 76

61 of 76

61

Standard Vocabulary

dc: Dublin Core (predicates for describing documents)

http://purl.org/dc/elements/1.1/

foaf: Friend Of A Friend (relationships between people)

http://xmlns.com/foaf/0.1/

cc: Creative Commons (types of licences)

http://creativecommons.org/ns#

schema.org

62 of 76

62

Main classes and properties of FOAF

63 of 76

63

Dublin Core

64 of 76

64

Creative Commons

65 of 76

State of Art

65

Is RDFS enough for formal description of knowledge in a domain?

66 of 76

66

No, I cannot express the following axioms

  • Prokaryotic and Eukaryotic cell are disjoints
  • A protein has part some Amino Acid

Need more expressive language

67 of 76

67

Ontology Web Language (OWL)

  • Extension to RDFS (RDF Schema) based on description logic (DL). Enhanced vocabulary to express knowledge relating to classes, properties and individuals
    • Disjoint classes
      • Prokaryotic owl:disjointWith Eukaryotic
    • Quantification (some, only, 0->n)

:Protein owl:equivalentClass [

rdf:type owl:Restriction ;

owl:onProperty :hasPart;

owl:someValuesFrom :Amino Acid

] .

More axioms imply more interesting reasoning

68 of 76

68

Reasoning over OWL ontologies

  • Consistency: determines whether the ontology contains contradictions.

  • Satisfiability: determines whether classes can have instances

  • + reasoning used in RDFS

69 of 76

69

Contradiction implies inconsistency

  • Prokaryotic-cell owl:disjointWith Eukaryotic-cell
  • Fungal-cell rdfs:subClassOf Eukaryotic-cell

  • Spore rdf:type Prokaryotic-cell
  • Spore rdf:type Eukaryotic-cell

70 of 76

70

Ontology construction tool

71 of 76

71

Plan

  • What is an ontology ?
  • Why we need an ontology ?
  • Building and reasoning with ontologies
  • Ontologies applications in biomedicals
  • Conclusion

72 of 76

State of Art

72

73 of 76

73

Ontology enables the construction of new applications

74 of 76

74

Conclusion

  • Life sciences Linked Open Data (LSLOD) is web-scale data integration for life sciences data source

  • Semantic Web technologies provide interesting solutions
    • RDF, RDFS and OWL solve the heterogeneity problems (syntactic and semantic)
    • Linked Open Data principles facilitate web-scale data integration
    • SPARQL query language allows federated queries on LSLOD

  • LSLOD opens the door for new applications and discovery in LS

75 of 76

75

Current researches to maintain LSLOD

  • LSLOD (as LOD) suffers from the problems of accessibility/availability of SPARQL servers
    • SaGe: a SPARQL query engine for public Linked Data providers (http://sage.univ-nantes.fr/)
  • FAIR principles:
    • The FAIR Data Principles are a set of guiding principles in order to make data Findable, Accessible, Interoperable and Reusable (https://www.nature.com/articles/sdata201618)
  • schema.org
  • bioschema: Findability of data in the life sciences
    • https://bioschemas.org/

Thomas Minier, Hala Skaf-Molli and Pascal Molli. "SaGe: Web Preemption for Public SPARQL Query services" in Proceedings of the 2019 World Wide Web Conference (WWW'19), San Francisco, USA, May 13-17, 2019.

76 of 76

Semantic Web and Linked Data for Life Science

Hala Skaf-Molli

Hala.Skaf@univ-nantes.fr

Associate Professor, HDR

http://pagesperso.ls2n.fr/~skaf-h