CINECA WP3 - Text Mining Integrated Pipeline - High Level Schematic Diagram
Coming ......
Models
LexMapr TM-Pipeline
for CINECA
SORTA
(by Chao Pang)
Dictionary*
UMLS
Lookup table
MetaMap
Match to exactly
one concept
CINECA cohort
free text
Spelling corrector
Free text
normalized
Normalization pipeline
Exact match module
Learning to rank module
Getting more than two
candidates from MetaMap
Learning to rank
(new candidates order)
CUI
CUI
CUI
CUI
CUI
CUI
CUI
Simplified diagram of HES-SO/SIB text mining workflow
* N2C2, MedMentions data
still not normalized?
API concept
-disease
-drug
-gender
-procedure
-HES-SO/SIB
-LexMapr
-SORTA
-Zooma/EBI
Input free text
API sketch: https://wireframe.cc/Vqu0ij
Load free text/semi-structured text from local file
API concept
Output: Normalized free text
Web API
Free text:
HEADACHE, BACK PAINS
Concept type:
Disease
Model:
HES-SO/SIB
Normalization pipeline
API input/output example
Input:
Output ontology | Concept code | Concept name | Normalization score |
UMLS | C0018681 | Headache | 0.5958 |
UMLS | C0004604 | Back Pain | 0.4882 |
UMLS | C4553197 | Headache, CTCAE | 0.3011 |
API input/output example (LexMapr)
ZOOMA
API input/output example: ZOOMA
Label | Type | !Curated data sources | !Ontologies |
diabetes | disease | | |
Term Type | Term Value | Ontology Class Label | Mapping Confidence | Ontology Class ID | Source |
disease | diabetes | diabetes mellitus | High | EFO_0000400 | https://www.ebi.ac.uk/gxa |