Introducing Topic Maps
Augmented Claim Craft Ecosystem:
HyperKnowledge-�OpenSherlock Overview
CoronaWhy Seminar
Jack Park
TopicQuests Foundation
ORCID: https://orcid.org/0000-0002-4356-4928
Marc-Antoine Parent
Conversence
ORCID: https://orcid.org/0000-0003-4159-7678
Does ApoE4 cause Alzheimer’s?
A focus question about claims people want answered
Wikipedia
A motivating story
“I asked Dr. Liddelow whether he was familiar with the Tsimané research. He admitted that he was not — the field of evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4 gene evolved to protect our brains from the effects of parasitic infection made perfect sense. “That’s absolutely in line with what we found. For our ancestors, an ApoE4 gene could have been beneficial,” Dr. Liddelow said, in part because it would have helped the astrocytes go on the attack.”
*https://www.nytimes.com/2017/07/14/opinion/sunday/alzheimers-cure-south-america.html
**https://en.wikipedia.org/wiki/Tsiman%C3%A9
***http://www.unm.edu/~tsimane/
From documents to augmenting knowledge work
Documents
Structured Documents
Basic
claim discovery
Entity identification
Augmented�Claim Craft
CoronaWhy
OpenSherlock 1
Spacy
?
Claim representation in HyperKnowledge
Aim: To be able to bring claims together: compare and federate claims, make claims about claims...
The data model should be rich enough to express claims found in the literature, and claims about those claims.
Basic claim representation
Hydroxychloroquine is used to treat Covid-19
Basic claim representation
Hydroxychloroquine is used to treat Covid-19
Subject - Predicate - Object (used in RDF)
Each concept (topic) has an identifier (URI) to reduce ambiguity
Covid-19�wiki:Q84263196
Hydroxychloroquine�wiki:Q84263196
Drug used for treatment
wikip:P2176
“Citation needed”: qualifying claims
paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.
“Citation needed”: qualifying claims
paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.
RDF does this through reification, Wikidata just gives identity to claims (snaks)
Give an identity to the claim itself, so we can make further claims about that claim, such as provenance, authority, etc.
Hydroxychloroquine�wiki:Q84263196
Covid-19�wiki:Q84263196
Drug used for treatment
wikip:P2176
Claim
cc00feb7-4b9b-121d-898b-7c6652b2b406
Hydroxychloroquine and azithromycin …�DOI:10.1016/J.IJANTIMICAG.2020.105949
rdf:subject
rdf:predicate
rdf:object
Stated in
wikip:P248
Complex claims
The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a day for 10 days, for 20 patients whose average age was 51 years (σ=19)
Many claims involve many entities in complex relationships, and should be represented as such.
Complex claims
The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a day for 10 days, for 20 patients whose average age was 51 years (σ=19)
Many claims involve many entities in complex relationships, and should be represented as such.
Topic mapping, frames (Minsky), KIF
Hydroxychloroquine�wiki:Q84263196
Covid-19�wiki:Q84263196
Medical Protocol
substance
disease
200mg
amount
3x/
day
frequency
10 days
duration
Group 2
Group 1
Control�group
Study�group
20 px
size
μ=51 σ=19
age
Topic Map Structure
Some claims are hypothetical
If social distancing measures are not followed, we risk a second wave.
Some claims are hypothetical
If social distancing measures are not followed, we risk a second wave.
We need a way to represent hypothetical scenarios.
The hypothetical world is a whole separate universe of discourse, which we represent as a subgraph. (Sowa’s Conceptual graphs)
Event: infection rate
> 50 % rise
Social distancing norms
Compliance level
Target population
< 80%
consequence
Hypothetical situation
Points of view should be explicit
Covid-19, as depicted by Fox News, is not more serious than a minor cold.�Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.�The results of laboratory X have been contested.
Claims are made by agents, and adopted by communities. It is sometimes important to distinguish references to a topic as it is understood by a specific agent or community.
Points of view should be explicit
Covid-19, as depicted by Fox News, is not more serious than a minor cold.�Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.�The results of laboratory X have been contested.
Claims are made by agents, and adopted by communities. It is sometimes important to distinguish references to a topic as it is understood by a specific agent or community.
Each claim has to be identified as coming from a specific source, maintained by agents. The properties and links attributed to a topic can be different for each source. Source federation is explicit.
B
A
7
C
A
6
C
A
8
Claims are made and retracted
Lab X claimed to find reinfection after remission, but those cases were due to false negative testing in an asymptomatic phase.
People can change their minds; claims can be a correction to earlier claim.
Claims are made and retracted
Lab X claimed to find reinfection after remission, but those cases were due to false negative testing in an asymptomatic phase.
We view claims from a source as an event stream. Some events in the stream can explicitly contradict earlier events
A
B
A
3
A
C
A
5
x
y
x
y
A
D
x
...
{
“@id”: “A”,
“x”: [“B”, “D”],
“y”: 5
}
Different communities use different names or identifiers
English names for Covid-19 in Wikidata: 2019-nCoV acute respiratory disease ; coronavirus disease 2019 ; COVID19 ; COVID 19 ; Covid-19 ; 2019 novel coronavirus pneumonia ; Coronavirus disease 2019 ; nCOVD19 ; nCOVD 19 ; nCOVD-19 ; COVID-2019 ; seafood market pneumonia ; Wuhan pneumonia ; 2019 NCP ; WuRS ; severe acute respiratory syndrome type 2 ; SARS-CoV-2 infection ; 2019 novel coronavirus respiratory syndrome ; Wuhan respiratory syndrome ; novel coronavirus ; coronavirus
Of course we’d want to also search for 2019冠状病毒病 etc.
RDF identifiers in Wikidata:
<http://www.wikidata.org/wiki/Q84263196>
<https://catalogue.bnf.fr/ark:/12148/cb17874453m>
<https://d-nb.info/gnd/1206347392>
<https://id.loc.gov/authorities/sh2020000570>
<https://meshb.nlm.nih.gov/#/record/ui?ui=C000657245>
<http://id.nlm.nih.gov/mesh/T001007884>
<http://id.nlm.nih.gov/mesh/M000681578>
<https://www.courrierinternational.com/sujet/covid-19>
<http://www.disease-ontology.org/?id=DOID:0080600>
<http://www.diseasesdatabase.com/ddb60833.htm>
<http://emedicine.medscape.com/article/2500114-overview>
<https://www.britannica.com/science/COVID-19>
<https://www.enciclopedia.cat/EC-GEC-23470930.xml>
<https://icd.who.int/browse10/2019/en#/U07.1>
<https://icd.who.int/browse10/2019/en#/U07.2>
<https://icd.who.int/dev11/f/en#/http://id.who.int/icd/entity/1790791774>
<https://www.malacards.org/card/2019_novel_coronavirus>
<https://www.ne.se/uppslagsverk/encyklopedi/lång/covid-19>
<https://www.nhs.uk/conditions/coronavirus-covid-19>
<http://www.omegawiki.org/DefinedMeaning:1733730>
<https://philpapers.org/browse/covid-19>
<https://www.quora.com/topic/COVID>
<http://snomed.info/id/840539006>
<https://sml.snl.no/covid-19>
<https://www.reddit.com/r/Coronavirus/>
<https://www.reddit.com/r/COVID19/>
<http://www.treccani.it/enciclopedia/ricerca/COVID>
<https://tvtropes.org/pmwiki/pmwiki.php/UsefulNotes/CoronavirusDisease2019Pandemic>
<http://www.yso.fi/onto/yso/p38829>
<https://denstoredanske.lex.dk/COVID-19>
Note missing: kg:/m/01cpyy (Google)
Different communities use different names or identifiers
Many concepts share the same name. Many names share the same concept.
Names have to be disambiguated. Global concept identifiers can be tentatively identified, but all identifiers are tagged with their source, and the identifier X as used by source A may not correspond to the concept referred to by X in source B.
Unifying topics is the domain of topic mappings
Topic Map as a federation platform
Federating Silos: introduction
Federating Silos: Topic Mapping
Topic merging opens questions and creates events
Distributed federation in HyperKnowledge
Each source maintains its own table of topic merges, and federated queries must keep track of those equivalences.
This can be expanded (with normalization) to identification of composite topics.
The plan is for the HK ecosystem to maintain a probabilistic (bloom) map of which sources maintain information about which topics.
Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.�132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.
Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.�132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.
Once claims have an identity, we can compare claims and make higher-level claims.
Hydroxy-�chloroquine
Covid-19
Drug used for treatment
Claim 1
DOI:10.1016/�J.ijantimicag.�2020.105949
Hydroxy-�chloroquine
refractory ventricular arrhythmia
Side-effect
Claim 2
DOI:10.1080/�15563650500514558
risk/benefit analysis
Risks outweigh benefits
risks
benefits
outcome
Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.�132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19 treatment.
Claim streams representing individual points of views can be combined into “community” streams, and into combined values
...
...
...
...
...
...
...
So what can be a stream?
Comparing claims allows combining claims in larger aggregates
Opposite end of the spectrum: Casual small streams (like git branches)
Inference engine ecosystem
Event sourcing as a backbone for knowledge-based microservices
Services subscribe to claims, produces calculations, main queue subscribes to calculations
Reactive calculations
Eg.: Rule-based inference,�Live query maintenance,
Machine learning,
Inference combination, etc.�
...
Inference engine ecosystem
Synthesis as a service
Synthesis can be simple statistics (who believes this), sample size, Bayesian, etc.
Simple awareness of which claims are established or contested (and by whom) is useful
...
...
...
Inference engine ecosystem
Augmented collaboration: start with a single-source view of a claim stream
...
...
...
Inference engine ecosystem
Augmented collaboration: become aware of relevant claims from federation stream
...
...
...
From documents to augmenting knowledge work
HyperKnowledge
Documents
Structured Documents
Basic
claim discovery
Entity identification
Augmented Claim Craft
CoronaWhy
OpenSherlock 1
Spacy
!
Structured documents to claims with OpenSherlock
OpenSherlock: example sentence
The pandemic of obesity, type 2 diabetes mellitus (T2DM) and nonalcoholic fatty liver disease (NAFLD) has frequently been associated with dietary intake of saturated fats (1) and specifically with dietary palm oil (PO) (2).
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5272194/
OpenSherlock: expected claims from that sentence
Obesity associated with saturated fats
Obesity associated with palm oil
T2DM associated with saturated fats
Type 2 Diabetes mellitus has acronym T2DM
T2DM associated with palm oil
NAFLD associated with saturated fats
Nonalcoholic Fatty Liver Disease has acronym NAFML
NAFLD associated with palm oil
“Obesity associated with saturated fats”: The Predicate
Obesity associated with saturated fats: The Object
“Obesity associated with saturated fats”: The Subject
Next steps
Higher-order claims are still beyond current NLP techniques; but deep learning tools can augment intelligence of researchers identifying claims, and symbolic AI can be used to identify logical connections and contradictions.
The HyperKnowledge federation can help researchers craft higher-order claims�by identifying both the logical and social neighbourhood of claims.
We would like this ecosystem to be how the next Drs. Liddelow and Trumble get to be aware of one another.
References
RDF, W3C
Wikidata data model primer
Patrick Durusau, Steven R. Newcomb, and Robert Barta. Topic maps reference model. ISO standard 13250-5 CD, 11 2007.
John F. Sowa. Handbook of Knowledge Representation, chapter Conceptual Graphs, pages 213–237. Elsevier, 2008. isbn: 9780444522115
Knowledge Interchange Format, Stanford