ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Task Roadmap
2
3
Task 1T: Therapeutics
4
5
Subtask 1T.1
Drug treatment extraction (Deliverable 1)
6
SST 1T.1.1.1
- Build first-pass lexicon of "treatment" synonyms
7
SST 1T.1.1.2
- Refine lexicon of "treatment" synonyms (use pre-trained Word2Vec model?)
8
SST 1T.1.2.1
- Get training data for drug-"treats" indication task
9
SST 1T.1.2.2
- Extract drug-"treats" indication in COVID papers with co-occurrences
10
SST 1T.1.2.3
- Extract drug-"treats" indication in COVID papers with syntax (dependency) parsing
11
SST 1T.1.2.4
- Extract drug-"treats" indication in COVID papers with DL (approach 1 - BERT)
12
SST 1T.1.2.5
- Extract drug-"treats" indication in COVID papers with DL (approach 2)
13
SST 1T.1.2.6
- Evaluate performance of all drug-"treats" models, select best model
14
SST 1T.1.2.7
- Incorporate best drug-"treats" model into final pipeline
15
SST 1T.1.5.1
- Hedge detection in treatment sentences using co-occurrence
16
SST 1T.1.5.2
- Hedge detection in treatment sentences using syntax parsing
17
SST 1T.1.5.3
- Hedge detection in treatment sentences using DL
18
SST 1T.1.5.4
- Evaluate performance of all hedge detection models, select best model
19
SST 1T.1.5.5
- Incorporate best hedge detection model into final pipeline
20
SST 1T.1.6.1
- Negation detection in treatment sentences using co-occurrence
21
SST 1T.1.6.2
- Negation detection in treatment sentences using syntax parsing
22
SST 1T.1.6.3
- Negation detection in treatment sentences using DL
23
SST 1T.1.6.4
- Evaluate performance of all negation detection models, select best model
24
SST 1T.1.6.5
- Incorporate best negation detection model into final pipeline
25
SST 1T.1.7
- Refine the drug lexicon (use intersection of multiple lexicons, pre-filter somehow?)
26
SST 1T.1.8.1
- Collect metadata about drugs
27
SST 1T.1.8.2
- Join and incorporate metadata into final pipeline
28
29
30
Subtask 1T.2
Model organism extraction (Deliverable 1)
31
32
SST 1T.2.1
- Explore animal model mentions in CORD-19 (will spawn subtasks)
33
34
Subtask 1T.3
Paper-level annotation of "computational", "experimental", "clinical" (Deliverable 1)
35
36
SST 1T.3.1
- Manually annotate 1000 papers as containing "computational", "experimental (lab)", or "clinical" evidence
37
SST 1T.3.2.1
- Automatically annotate papers' evidence types based on co-occurrence
38
SST 1T.3.2.2
- Automatically annotate papers' evidence types using an unsupervised learning approach
39
SST 1T.3.2.3
- Automatically annotate papers' evidence types based on DL (approach 1 - BERT)
40
SST 1T.3.2.4
- Automatically annotate papers' evidence types based on DL (approach 2)
41
SST 1T.3.2.5
- Evaluate performance of all evidence classifiers, select best model
42
SST 1T.3.2.6
- Incorporate best evidence classifier model into final pipeline
43
SST 1T.3.5.1
- Extract methods being used using topic modeling over "Methods" section of paper (TF-IDF?)
44
SST 1T.3.5.2
- Contingent on success of 1T.3.5.1, incorporate into final pipeline
45
46
Subtask 1T.4
COVID mechanism of action knowledge graph creation (Deliverable 2)
47
48
SST 1T.4.1
- Create a Neo4J KG from Kroger paper*
49
SST 1T.4.2
- Consider incorporating STRING PPI relations into Neo4J
50
SST 1T.4.3.1
- Get training data for relation extraction from text
51
SST 1T.4.3.2
- Extract PPI relations from text using co-occurrence
52
SST 1T.4.3.3
- Extract PPI relations from text using syntax parsing
53
SST 1T.4.3.4
- Extract PPI relations from text using ML (approach 1)
54
SST 1T.4.3.5
- Extract PPI relations from text using BERT (approach 2)
55
SST 1T.4.4
- Explore tissue specificity of PPI relations in BERT (will spawn subtasks)
56
SST 1T.4.5
- Explore drug-protein interactions in CORD dataset (will spawn subtasks)
57
SST 1T.4.6
- Incorporate all relationships extracted into Neo4J KG
58
59
*Kroger paper: https://www.biorxiv.org/content/10.1101/2020.03.22.002386v1.full.pdf
60
61
Subtask 1T.5
Posology information extraction (Deliverable 3)
62
63
SST 1T.5.1
- Filter documents (primary clinical literature)
64
SST 1T.5.2
- Extract Dosing Information (MedEx)
65
SST 1T.5.3
- Filter MedEx annotation errors (get rid of irrelevant info)
66
SST 1T.5.4
- Extract clinical trial ID numbers
67
SST 1T.5.5
- Evaluate results (for each step)
68
SST 1T5.6
- Repeat pipeline SST 1T.5.1-1T.5.5 for all posology columns (will possibly spawn subtasks)
69
70
71
Subtask 1T.MISC
Task V&T infrastructure
72
73
SST 1T.MISC.1.1
- Increase formality of language in testing framework doc on GitHub wiki
74
SST 1T.MISC.1.2
- Implement improved testing framework
75
SST 1T.MISC.2
- Brainstorm and set up an evaluation pipeline for ML components
76
SST 1T.MISC.3
- Clean up GitHub repo (strip functions into helper functions)
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100