ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACAD
1
Question 1. Who should CFDE target for training and why?
Question 2: What types of training activities CFDE could offer?
Question 3: What materials should be delivered in the training? Should we provide basic training in Data Science? Specific training on how to use data from a CF program? Training on how to combine data from across CF programs?Question 4: How can we better coordinate and integrate training activities among the DCCs and partnership projects over the next year?
2
Table 1Data scientists, basic scientists -- beginner to expertCreate a "stackexchange" knowledgebase for CFDE. Most data science needs I have are met by a stackexchange search.A "stackexchange" site for CFDE. Leverage free data science courses. Unique content on combining CFDE resources should be a focus area.Use self-organizing best practices for procing valuable answers on how to solve problems -- incentivizing contribution. Again, look at stackexchange and other knowledgebase models.
3
Industry scientists and academic scientistis -- also for graduate students with thesis-ready use casesCurated set of general resources, leveraging open courses -- then CFDE-and DCC-specific courses.Tutorials, YouTube tutorial videos, Jupyter notebooks, R and python notebooks.Put money behind it.
4
CFDE members - to get immediate feedback, be our own beta testersbest practices in informatics and engineeringFocus on what only CFDE can do, leverage external resources for everything else. Do NOT focus on basic data science training.Identify together who our users are and who can use training across DCCs
5
Both CFDE investigators and staff.Focus on what only CFDE can do, leverage external resources for everything else. Do NOT focus on basic data science training.Have hands-on sessions dedicated to one DCC at a time -- with all of the DCCs participating.
6
Graduate students from wet labs to better identify experimental targets.training program for future DCC leaders -- potentially K-awardHackatons end up being de facto training activities
7
"Data parasites" -- knowledge discovey using other peoples' dataFocus on making use cases doable (reproducible) in a short period of time by non-experts
8
Sharing people between projects (DCCs)
9
share/expose existing CFDE resources at DCCs
10
11
Table 2Our training should focus on what data we have, which is something that crosses all experience levels. So I don't think we should focus on a particular experience level of users. Junior investigators are the ones that typically use the data, but the PI level people need to understand the data in order to direct their research. We should target both wet lab and computational biologists.We should provide 1) pre-recorded videos, 2) live webinars, and 3) workshops (both in-person and virtual).We should deliver training videos, presentations, and runable coding environments (e.g., Jupyter notebooks). No, we should not provide training in basic data science -- that is a very large subject and there are likely organizations better able to do that than us. We should provide training on how to use CF data. Combining data across CF programs is an unsolved research problem with significant pitfalls. We can not provide a recipe for how to do that, but we should present pitfalls and considerations.
12
Researchers who are interested in CF datasets.1) Datasets that are available 2) How to access those datasets 3) Examples on how to use the datasets and available pipeline and toolsHow to use the data from the CF program and how to leverage other CF datasets to combine them for analysisKeep it simple. Can we define one or two training events to which we can contribute?
13
Graduate students, postdocs, and MD/Phd trainees. Less frequently established PIs.Presentations that are overviews of data. How to use the CFDE portal. Hands-on workshops for using CF data in cloud environments like Terra.We may need to provide basic training in data science, but only in the context of CF datasets.Central tracker of CFDE training activities
14
Pharma/industry rsearchersTrain on use of APIs versus web interface.We should provide API documentation.
15
Medical traineesTraining with the use case collection. What is missing from the CFDE portal?
16
Other CFDE DCCsSimple cloud workshops with one or two of the datasets.
17
Why: People who are going to make use of the dataset and need themWhat happens behind the scenes? For example, how is the data normalized.
18
Train course instructors (e.g., professors).Hackathons.
19
20
Table 3All members of internal CFDE to be aware of what other groups are doing.Academic course in such topics as a clinical modeler.Training videos in a centralized location, perhaps on a YouTube channel. Ensure a logical order/grouped modules in which the videos should be watched.Working group-develop plan for moving forward
21
Other NIH programs that have cross-sectional research activities. So they can learn more about what groups are offering, can pull together data, etc.WorkshopsTutorial documentsTraining hackathon
22
External users for awareness, perspective, and feedback loop.Web-based platform trainingJupyter notebooksResearch what communities can/should be served; gathering feedback
23
Ensure PMs from all domains are aware of the process.Hackathons with targeted challenges that results in a prize.Place content or tutorials on site such as biostars.org
24
Data scientists at universities to get them interested. Develop courses around the data.Create a forum where teams can present on what they're doing. Solicit researchers to speak.Longer term: development of academic courses (e.g. SPARC - RFA-DK-20-020)
25
Bio-informatics, Clinical informatics students, Physician scientistsSeries of courses/modules.Coursera trainings that provides a certificate of completion.
26
Train the trainer (like software and data carpentry philosophy) where trainers can take training back to their local institutionFAIR training material - ability to utilize material developed by the DCC's from, for example, a CFDE training portal
27
28
29
Table 4Clinicians and practioners / translational medicineOnline learning coursesVideosMoney
30
Patients (appears to be a demand)Hackathons (solve a demo problem)Scrolli mapsCoordination group
31
Junior researchers (build a demand)Pathways to answer common questions / data types and how they connect to othersd to asnwer questionsMOOC (Coursera, Canvas) - Massive open online coursesFocus on partnership projects
32
Phd Students (build a demand)Workshops face-to-face (become MOOCs)Jupiter notebooksHaving dedicated training courses for integrating multiple datasets
33
Medical students (build a demand)R markdown documents
34
Project communities (focus on domains, problems)
35
Datascientist with Biological interest
36
BioInformatics
37
Database providers
38
39
40
Table 5Thought: define a cohort to track over time (e.g, medical students)"See one, do one, teach one model" similar to medical school. Specific training is important to empower the trainee to perform the types of analyses he/she cares about.Use metrics similar to those used by T32s
41
Teach the teacher: the impact of training bioinformaticians may multiply.We're better off to be training people to teach in order to scaleGeneric training for common platforms (RStudio, Python) is good if it can be connected to relevant data sets.Focus on outcomes such as empowering a class of users to recapitulate results published in model papers.
42
Inserting CFDE data into existing training programs.Summer schools - common in other STEMRecapitulate papers with (at least partial) use of CF data - materials to support this. e.g. https://allisonhorst.github.io/palmerpenguins/ or https://datacarpentry.org/genomics-workshop/ Focus of the humans as the product of training -- i.e. coordinate amongst the trainees
43
Explicitly targeting students who are NOT affiliated with the DCCsMaster's or PhD or postbac students
44
Masters-level programs.A T32 to train students to use CF data
45
Educational settings that themselves lack the local infrustructure for data generation/analysisRecapitualting analyses published in model papers.
46
location-independent "data scholars" https://datascience.nih.gov/data-scholars-2022"Project-based-learning" models
47
Leverage the CFDE to exapnd the data science workforceUsing google docs
48
49
Table 6New DCCs, any interested userJamborees (collaboration fest!), workshops, webinars, video walk throughsYesidentify commonalities
50
Identify community of practice, to identify the users and chart their user journeyMatchmaking eventsTraining to onboard the new DCCs -
Newest DCCs need to understand CFDE consortium structure
share training strategies
51
Analytical modelersOnline MOOCshow they submit data to the central repository; shared repository of materials
52
Data scientists, developers, implementers, clinicians, etc...what is expected from them and what they should expect from the consortiumexchange communities of practice, visiting internships, grad student diaries for credible practices
53
Next generation: grad students, postdocs, ECR, ESIOnline tutoria, interactive learning environments, all of the above
54
55
RemoteGrad students (develop data driven hypotheses)MOOCsQ3 (SPARC): it would be key to offer some basic training in graph databases and leverage distillery to combine knowledge from across CFDE DCCstraining working group
56
High school students (learn about the science of tomorrow)Training programs where undergraduate students spend 8-10 weeks in a DCC to complete a projectQ3 (SPARC) contd: I support the idea of Education Games mentioned in Q2: have goal-driven questions to trace complex but biomedically interesting cross-graph pathsdesign a joint online course
57
Faculty - PIs (learn about new methodologies and resources)Office hoursThere are already a lot of great materials about training in Data Science and Bioinformatics out there so it might be better just point to them.build upon what Titus' team developed so far
58
Bioinformaticians and Data Scientists (learn about tools and datasets that they can leverage)summer internships / programs to grad studentsCrowdsourcing challenges (?)Have a system where existing training materials are announced, perhaps in the weekly newsletter.
59
Post docs - good opportunity to introduce Common Fund to labs/ groups that haven't worked with CF data yetworkshops at relevant conferencesThere are already quite a few basic data science trainings. If we are going to create another basic training, needs to use CF data for examples, ideally multiple.Q4 (SPARC): this might make a strong case for a training partnership to be supported by NIH
60
Bioinformaticians and Data Scientists (learn about tools and datasets that they can leverage)Online course for grad students with lectures from representatives from each DCCTraining on the tools CFDE is creating and maybe examples of how they can be appliedQ4 (SPARC) contd: This would require for considerable planning for such a partnership in the current year.
61
Medical Doctors (learn about the future of deep phenotyping)Educational games (?)User manuals and video tutorials about CFDE productsjoint workshop presentations/trainings that show how tools are unique but complimentary
62
The General Public (learn about how their taxes are spent to advance medicine)Q2 (SPARC): Navigational training: given the very extensive breadth of CFDE data coverage, pre-planned searches that introduce the user to the rich detail of our coverage.
63
Computer Science Undergrads (apply programming skills to analyze real biomedical data)Q2 (SPARC): prep scenarios that frame a "quest" for data answering a well-defined biomedical question (find tissue location of drug transporter involved in disease X)
64
Experienced data scientists have indicated they find short videos helpful to orient/ learn new tools
65
Trainees need to see value in training. If you can demonstrate a good use case for a tool/resource and publicize (publish?) you can attract interest and design training around it.
66
resources for training within an MD-PhD program
67
68
69
70
71
72
73
74
75