Prototype V&T Deliverables

	A	B
1	Task Roadmap
2
3	Task 1T: Therapeutics
4
5	Subtask 1T.1	Drug treatment extraction (Deliverable 1)
6	SST 1T.1.1.1	- Build first-pass lexicon of "treatment" synonyms
7	SST 1T.1.1.2	- Refine lexicon of "treatment" synonyms (use pre-trained Word2Vec model?)
8	SST 1T.1.2.1	- Get training data for drug-"treats" indication task
9	SST 1T.1.2.2	- Extract drug-"treats" indication in COVID papers with co-occurrences
10	SST 1T.1.2.3	- Extract drug-"treats" indication in COVID papers with syntax (dependency) parsing
11	SST 1T.1.2.4	- Extract drug-"treats" indication in COVID papers with DL (approach 1 - BERT)
12	SST 1T.1.2.5	- Extract drug-"treats" indication in COVID papers with DL (approach 2)
13	SST 1T.1.2.6	- Evaluate performance of all drug-"treats" models, select best model
14	SST 1T.1.2.7	- Incorporate best drug-"treats" model into final pipeline
15	SST 1T.1.5.1	- Hedge detection in treatment sentences using co-occurrence
16	SST 1T.1.5.2	- Hedge detection in treatment sentences using syntax parsing
17	SST 1T.1.5.3	- Hedge detection in treatment sentences using DL
18	SST 1T.1.5.4	- Evaluate performance of all hedge detection models, select best model
19	SST 1T.1.5.5	- Incorporate best hedge detection model into final pipeline
20	SST 1T.1.6.1	- Negation detection in treatment sentences using co-occurrence
21	SST 1T.1.6.2	- Negation detection in treatment sentences using syntax parsing
22	SST 1T.1.6.3	- Negation detection in treatment sentences using DL
23	SST 1T.1.6.4	- Evaluate performance of all negation detection models, select best model
24	SST 1T.1.6.5	- Incorporate best negation detection model into final pipeline
25	SST 1T.1.7	- Refine the drug lexicon (use intersection of multiple lexicons, pre-filter somehow?)
26	SST 1T.1.8.1	- Collect metadata about drugs
27	SST 1T.1.8.2	- Join and incorporate metadata into final pipeline
28
29
30	Subtask 1T.2	Model organism extraction (Deliverable 1)
31
32	SST 1T.2.1	- Explore animal model mentions in CORD-19 (will spawn subtasks)
33
34	Subtask 1T.3	Paper-level annotation of "computational", "experimental", "clinical" (Deliverable 1)
35
36	SST 1T.3.1	- Manually annotate 1000 papers as containing "computational", "experimental (lab)", or "clinical" evidence
37	SST 1T.3.2.1	- Automatically annotate papers' evidence types based on co-occurrence
38	SST 1T.3.2.2	- Automatically annotate papers' evidence types using an unsupervised learning approach
39	SST 1T.3.2.3	- Automatically annotate papers' evidence types based on DL (approach 1 - BERT)
40	SST 1T.3.2.4	- Automatically annotate papers' evidence types based on DL (approach 2)
41	SST 1T.3.2.5	- Evaluate performance of all evidence classifiers, select best model
42	SST 1T.3.2.6	- Incorporate best evidence classifier model into final pipeline
43	SST 1T.3.5.1	- Extract methods being used using topic modeling over "Methods" section of paper (TF-IDF?)
44	SST 1T.3.5.2	- Contingent on success of 1T.3.5.1, incorporate into final pipeline
45
46	Subtask 1T.4	COVID mechanism of action knowledge graph creation (Deliverable 2)
47
48	SST 1T.4.1	- Create a Neo4J KG from Kroger paper*
49	SST 1T.4.2	- Consider incorporating STRING PPI relations into Neo4J
50	SST 1T.4.3.1	- Get training data for relation extraction from text
51	SST 1T.4.3.2	- Extract PPI relations from text using co-occurrence
52	SST 1T.4.3.3	- Extract PPI relations from text using syntax parsing
53	SST 1T.4.3.4	- Extract PPI relations from text using ML (approach 1)
54	SST 1T.4.3.5	- Extract PPI relations from text using BERT (approach 2)
55	SST 1T.4.4	- Explore tissue specificity of PPI relations in BERT (will spawn subtasks)
56	SST 1T.4.5	- Explore drug-protein interactions in CORD dataset (will spawn subtasks)
57	SST 1T.4.6	- Incorporate all relationships extracted into Neo4J KG
58
59		*Kroger paper: https://www.biorxiv.org/content/10.1101/2020.03.22.002386v1.full.pdf
60
61	Subtask 1T.5	Posology information extraction (Deliverable 3)
62
63	SST 1T.5.1	- Filter documents (primary clinical literature)
64	SST 1T.5.2	- Extract Dosing Information (MedEx)
65	SST 1T.5.3	- Filter MedEx annotation errors (get rid of irrelevant info)
66	SST 1T.5.4	- Extract clinical trial ID numbers
67	SST 1T.5.5	- Evaluate results (for each step)
68	SST 1T5.6	- Repeat pipeline SST 1T.5.1-1T.5.5 for all posology columns (will possibly spawn subtasks)
69
70
71	Subtask 1T.MISC	Task V&T infrastructure
72
73	SST 1T.MISC.1.1	- Increase formality of language in testing framework doc on GitHub wiki
74	SST 1T.MISC.1.2	- Implement improved testing framework
75	SST 1T.MISC.2	- Brainstorm and set up an evaluation pipeline for ML components
76	SST 1T.MISC.3	- Clean up GitHub repo (strip functions into helper functions)
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100