1 of 43

Introducing Topic Maps

  • A Topic Map is like a library without all the books
    • A Topic Map is indexical
      • Like a card catalog
      • Each topic has its own representation
      • Improving on a card catalog, a topic can be identified many different ways
      • Captures metadata and optional content
    • A Topic Map is relational
      • Like a good road map
      • Topics are connected by associations (relations)
      • Topics point to their occurrences in the territory
    • A Topic Map is organized
      • Multiple records on the same topic are co-located (stored as one topic) in the map

2 of 43

Augmented Claim Craft Ecosystem:

HyperKnowledge-�OpenSherlock Overview

CoronaWhy Seminar

Jack Park

TopicQuests Foundation

ORCID: https://orcid.org/0000-0002-4356-4928

Marc-Antoine Parent

Conversence

ORCID: https://orcid.org/0000-0003-4159-7678

3 of 43

Does ApoE4 cause Alzheimer’s?

A focus question about claims people want answered

Wikipedia

4 of 43

A motivating story

  • Alzheimer’s Context *
    • Dr. Trumble and the Tsimané ** Project ***
      • Anthropologist studying evolutionary medicine
      • Indigenous people, Bolivia
      • Higher elderly cognitive performance with copy of ApoE4 gene
    • Dr. Liddelow studying immune response in brains
      • Some people die without dementia but with brains clogged with Alzheimer’s pathology

  • A Quote (emphasis mine):

“I asked Dr. Liddelow whether he was familiar with the Tsimané research. He admitted that he was not — the field of evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4 gene evolved to protect our brains from the effects of parasitic infection made perfect sense. “That’s absolutely in line with what we found. For our ancestors, an ApoE4 gene could have been beneficial,” Dr. Liddelow said, in part because it would have helped the astrocytes go on the attack.”

*https://www.nytimes.com/2017/07/14/opinion/sunday/alzheimers-cure-south-america.html

**https://en.wikipedia.org/wiki/Tsiman%C3%A9

***http://www.unm.edu/~tsimane/

5 of 43

From documents to augmenting knowledge work

Documents

Structured Documents

Basic

claim discovery

Entity identification

Augmented�Claim Craft

CoronaWhy

OpenSherlock 1

Spacy

?

6 of 43

Claim representation in HyperKnowledge

Aim: To be able to bring claims together: compare and federate claims, make claims about claims...

The data model should be rich enough to express claims found in the literature, and claims about those claims.

7 of 43

Basic claim representation

Hydroxychloroquine is used to treat Covid-19

8 of 43

Basic claim representation

Hydroxychloroquine is used to treat Covid-19

Subject - Predicate - Object (used in RDF)

Each concept (topic) has an identifier (URI) to reduce ambiguity

Covid-19�wiki:Q84263196

Hydroxychloroquine�wiki:Q84263196

Drug used for treatment

wikip:P2176

9 of 43

“Citation needed”: qualifying claims

paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.

10 of 43

“Citation needed”: qualifying claims

paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.

RDF does this through reification, Wikidata just gives identity to claims (snaks)

Give an identity to the claim itself, so we can make further claims about that claim, such as provenance, authority, etc.

Hydroxychloroquine�wiki:Q84263196

Covid-19�wiki:Q84263196

Drug used for treatment

wikip:P2176

Claim

cc00feb7-4b9b-121d-898b-7c6652b2b406

Hydroxychloroquine and azithromycin …DOI:10.1016/J.IJANTIMICAG.2020.105949

rdf:subject

rdf:predicate

rdf:object

Stated in

wikip:P248

11 of 43

Complex claims

The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a day for 10 days, for 20 patients whose average age was 51 years (σ=19)

Many claims involve many entities in complex relationships, and should be represented as such.

12 of 43

Complex claims

The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a day for 10 days, for 20 patients whose average age was 51 years (σ=19)

Many claims involve many entities in complex relationships, and should be represented as such.

Topic mapping, frames (Minsky), KIF

Hydroxychloroquine�wiki:Q84263196

Covid-19�wiki:Q84263196

Medical Protocol

substance

disease

200mg

amount

3x/

day

frequency

10 days

duration

Group 2

Group 1

Control�group

Study�group

20 px

size

μ=51 σ=19

age

13 of 43

Topic Map Structure

14 of 43

Some claims are hypothetical

If social distancing measures are not followed, we risk a second wave.

15 of 43

Some claims are hypothetical

If social distancing measures are not followed, we risk a second wave.

We need a way to represent hypothetical scenarios.

The hypothetical world is a whole separate universe of discourse, which we represent as a subgraph. (Sowa’s Conceptual graphs)

Event: infection rate

> 50 % rise

Social distancing norms

Compliance level

Target population

< 80%

consequence

Hypothetical situation

16 of 43

Points of view should be explicit

Covid-19, as depicted by Fox News, is not more serious than a minor cold.�Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.�The results of laboratory X have been contested.

Claims are made by agents, and adopted by communities. It is sometimes important to distinguish references to a topic as it is understood by a specific agent or community.

17 of 43

Points of view should be explicit

Covid-19, as depicted by Fox News, is not more serious than a minor cold.�Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.�The results of laboratory X have been contested.

Claims are made by agents, and adopted by communities. It is sometimes important to distinguish references to a topic as it is understood by a specific agent or community.

Each claim has to be identified as coming from a specific source, maintained by agents. The properties and links attributed to a topic can be different for each source. Source federation is explicit.

B

A

7

C

A

6

C

A

8

18 of 43

Claims are made and retracted

Lab X claimed to find reinfection after remission, but those cases were due to false negative testing in an asymptomatic phase.

People can change their minds; claims can be a correction to earlier claim.

19 of 43

Claims are made and retracted

Lab X claimed to find reinfection after remission, but those cases were due to false negative testing in an asymptomatic phase.

We view claims from a source as an event stream. Some events in the stream can explicitly contradict earlier events

A

B

A

3

A

C

A

5

x

y

x

y

A

D

x

...

{

“@id”: “A”,

“x”: [“B”, “D”],

“y”: 5

}

20 of 43

Different communities use different names or identifiers

English names for Covid-19 in Wikidata: 2019-nCoV acute respiratory disease ; coronavirus disease 2019 ; COVID19 ; COVID 19 ; Covid-19 ; 2019 novel coronavirus pneumonia ; Coronavirus disease 2019 ; nCOVD19 ; nCOVD 19 ; nCOVD-19 ; COVID-2019 ; seafood market pneumonia ; Wuhan pneumonia ; 2019 NCP ; WuRS ; severe acute respiratory syndrome type 2 ; SARS-CoV-2 infection ; 2019 novel coronavirus respiratory syndrome ; Wuhan respiratory syndrome ; novel coronavirus ; coronavirus

Of course we’d want to also search for 2019冠状病毒病 etc.

RDF identifiers in Wikidata:

<http://www.wikidata.org/wiki/Q84263196>

<https://catalogue.bnf.fr/ark:/12148/cb17874453m>

<https://d-nb.info/gnd/1206347392>

<https://id.loc.gov/authorities/sh2020000570>

<https://meshb.nlm.nih.gov/#/record/ui?ui=C000657245>

<http://id.nlm.nih.gov/mesh/T001007884>

<http://id.nlm.nih.gov/mesh/M000681578>

<https://www.courrierinternational.com/sujet/covid-19>

<http://www.disease-ontology.org/?id=DOID:0080600>

<http://www.diseasesdatabase.com/ddb60833.htm>

<http://emedicine.medscape.com/article/2500114-overview>

<https://www.britannica.com/science/COVID-19>

<https://www.enciclopedia.cat/EC-GEC-23470930.xml>

<https://icd.who.int/browse10/2019/en#/U07.1>

<https://icd.who.int/browse10/2019/en#/U07.2>

<https://icd.who.int/dev11/f/en#/http://id.who.int/icd/entity/1790791774>

<https://www.malacards.org/card/2019_novel_coronavirus>

<https://www.ne.se/uppslagsverk/encyklopedi/lång/covid-19>

<https://www.nhs.uk/conditions/coronavirus-covid-19>

<http://www.omegawiki.org/DefinedMeaning:1733730>

<https://philpapers.org/browse/covid-19>

<https://www.quora.com/topic/COVID>

<http://snomed.info/id/840539006>

<https://sml.snl.no/covid-19>

<https://www.reddit.com/r/Coronavirus/>

<https://www.reddit.com/r/COVID19/>

<http://www.treccani.it/enciclopedia/ricerca/COVID>

<https://tvtropes.org/pmwiki/pmwiki.php/UsefulNotes/CoronavirusDisease2019Pandemic>

<http://www.yso.fi/onto/yso/p38829>

<https://denstoredanske.lex.dk/COVID-19>

Note missing: kg:/m/01cpyy (Google)

21 of 43

Different communities use different names or identifiers

Many concepts share the same name. Many names share the same concept.

Names have to be disambiguated. Global concept identifiers can be tentatively identified, but all identifiers are tagged with their source, and the identifier X as used by source A may not correspond to the concept referred to by X in source B.

Unifying topics is the domain of topic mappings

22 of 43

Topic Map as a federation platform

  • A topic map aggressively works to ensure that, for each individual subject represented in the map, there will be one and only one location for that subject.
  • To accomplish that, when a decision is made that two subject representations in the map are about the same subject, a new representation - a VirtualProxy- will be created which non-redundantly contains information from both - or any other topic which later enters the topic map.

23 of 43

Federating Silos: introduction

  • Siloed Research Topics
    • Raynaud’s Syndrome Therapies
    • Fish Oil
  • Machine Reading collects graph structures from different sources
    • Form tuple-like structures which are graphs

24 of 43

Federating Silos: Topic Mapping

  • TopicMap Process
    • Rule:
      • One Location in the Map for each Subject
      • Federates (merges topics about the same subject) collected from different resources

25 of 43

Topic merging opens questions and creates events

  • Does Fish Oil qualify as a Raynaud’s therapy?
    • Turns out Yes
  • Topic Merge events feed back into the HyperKnowledge ecosystem

26 of 43

Distributed federation in HyperKnowledge

Each source maintains its own table of topic merges, and federated queries must keep track of those equivalences.

This can be expanded (with normalization) to identification of composite topics.

The plan is for the HK ecosystem to maintain a probabilistic (bloom) map of which sources maintain information about which topics.

27 of 43

Comparing claims

The research on Hydroxychloroquin in study X was contradicted in study Y.�132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.

28 of 43

Comparing claims

The research on Hydroxychloroquin in study X was contradicted in study Y.�132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.

Once claims have an identity, we can compare claims and make higher-level claims.

Hydroxy-�chloroquine

Covid-19

Drug used for treatment

Claim 1

DOI:10.1016/�J.ijantimicag.�2020.105949

Hydroxy-�chloroquine

refractory ventricular arrhythmia

Side-effect

Claim 2

DOI:10.1080/�15563650500514558

risk/benefit analysis

Risks outweigh benefits

risks

benefits

outcome

29 of 43

Comparing claims

The research on Hydroxychloroquin in study X was contradicted in study Y.�132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.

Claim streams representing individual points of views can be combined into “community” streams, and into combined values

...

...

...

...

...

...

...

30 of 43

So what can be a stream?

Comparing claims allows combining claims in larger aggregates

  • Base case: One person’s point of view
  • One team (guild), with a procedure to merge how member’s PoV’s streams are combined (can be a rule like majority, consent, etc.)
  • A thematic collation, with points of dis/agreement marked without resolution
  • A curated thematic overview, with data-driven evidence
  • Eventually: global federation

Opposite end of the spectrum: Casual small streams (like git branches)

  • A thought experiment or hypothetical situation
  • A computed slice (query) of a stream can be treated like a stream

31 of 43

Inference engine ecosystem

Event sourcing as a backbone for knowledge-based microservices

Services subscribe to claims, produces calculations, main queue subscribes to calculations

Reactive calculations

Eg.: Rule-based inference,�Live query maintenance,

Machine learning,

Inference combination, etc.�

...

32 of 43

Inference engine ecosystem

Synthesis as a service

Synthesis can be simple statistics (who believes this), sample size, Bayesian, etc.

Simple awareness of which claims are established or contested (and by whom) is useful

...

...

...

33 of 43

Inference engine ecosystem

Augmented collaboration: start with a single-source view of a claim stream

...

...

...

34 of 43

Inference engine ecosystem

Augmented collaboration: become aware of relevant claims from federation stream

...

...

...

35 of 43

From documents to augmenting knowledge work

HyperKnowledge

Documents

Structured Documents

Basic

claim discovery

Entity identification

Augmented Claim Craft

  • Higher order claim discovery
  • Claim combination
  • Rule-based claim micro-services
  • ML-based claims
  • Human claim identification

CoronaWhy

OpenSherlock 1

Spacy

!

36 of 43

Structured documents to claims with OpenSherlock

  • Basic Setup
    • Each document is
      • mapped to a JSON structure and transferred to a Document database
      • broken into individual paragraphs
    • Each paragraph is becomes a Kafka event
  • Machine Reading
    • From paragraph Kafka events, each paragraph is
      • Broken into sentences by SpaCy
    • Each sentence is
      • Parsed by SpaCy
      • Parsed by LinkGrammar parser
      • Parse results are processed by a tuple detector to identify claims

37 of 43

OpenSherlock: example sentence

The pandemic of obesity, type 2 diabetes mellitus (T2DM) and nonalcoholic fatty liver disease (NAFLD) has frequently been associated with dietary intake of saturated fats (1) and specifically with dietary palm oil (PO) (2).

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5272194/

38 of 43

OpenSherlock: expected claims from that sentence

Obesity associated with saturated fats

Obesity associated with palm oil

T2DM associated with saturated fats

Type 2 Diabetes mellitus has acronym T2DM

T2DM associated with palm oil

NAFLD associated with saturated fats

Nonalcoholic Fatty Liver Disease has acronym NAFML

NAFLD associated with palm oil

39 of 43

“Obesity associated with saturated fats”: The Predicate

40 of 43

Obesity associated with saturated fats: The Object

41 of 43

“Obesity associated with saturated fats”: The Subject

42 of 43

Next steps

Higher-order claims are still beyond current NLP techniques; but deep learning tools can augment intelligence of researchers identifying claims, and symbolic AI can be used to identify logical connections and contradictions.

The HyperKnowledge federation can help researchers craft higher-order claims�by identifying both the logical and social neighbourhood of claims.

We would like this ecosystem to be how the next Drs. Liddelow and Trumble get to be aware of one another.

43 of 43

References

https://hyperknowledge.org

https://topicquests.org

RDF, W3C

Wikidata data model primer

Patrick Durusau, Steven R. Newcomb, and Robert Barta. Topic maps reference model. ISO standard 13250-5 CD, 11 2007.

John F. Sowa. Handbook of Knowledge Representation, chapter Conceptual Graphs, pages 213–237. Elsevier, 2008. isbn: 9780444522115

Knowledge Interchange Format, Stanford

https://ipld.io