Savonen page
Oct 16th 2024
Candace Savonen, M.S.
Data Scientist
Website: cansavvy.com
Code-based projects: github.com/cansavvy
csavonen@fredhutch.org
SKILLS
- Proficient in cutting edge technologies like AI and Machine Learning tools – has published courses on these techniques.
- 12+ years in genomic data analysis with an emphasis on open source and reproducibility.
- Expert in reproducible techniques for robust scientific workflow development using version control, Docker, GitHub Actions, high powered computing, and has also published courses to teach others these skillsets.
- Demonstrable experience in creating reproducible workflows for genomics: single cell RNA-seq pipelines, CRISPR pipelines, and other data types. See the course she’s organized and led on these techniques.
- Skilled communicator – knows how to tailor her communication to her audience. See list of talks here.
- Enjoys working collaboratively in open source platforms and exploring new techniques, takes joy in doing well organized and solid work.
- Working with APIs through having built Google Cloud Platform interacting R packages.
EXPERIENCE
Fred Hutchinson Cancer Center - Data Scientist
June 2022 - PRESENT
- Working to democratize data science for the Fred Hutch Data Science Lab under the leadership of Chief Data Science Officer, Jeff Leek.
- Leading tech development for a number reproducible software products and software education centered grants. See cansavvy.com/other_media
- Collaboratively creating a library of open source best practices for data science. See cansavvy.com/courses
- Extensively published author of manuscripts, software packages and courses. See cansavvy.com/research
Johns Hopkins University - Research Associate
April 2021 - June 2021
- Began her work with Jeff Leek’s lab before following the transfer of her lab group to Fred Hutchinson Cancer Center.
Childhood Cancer Data Lab - Biological Data Analyst
Sept 2018 - April 2021
- Built a flexible pipeline for single cell RNA-seq data
- Taught courses for cancer researchers on how to handle bulk and single cell RNA-seq data.
Michigan State University - Masters of Science
June 2015 - Aug 2018
- Master’s thesis work was a bioinformatic analyses of methylation data in the context of Parkinson’s Disease.
Wayne State University - Research Assistant
May 2013 - June 2015
- Collected and processed postmortem human brain samples for gene expression studies of drug addiction.
- Studied gene expression through microarray and RNA Seq data processing.
PUBLICATION SUMMARY
- (2021 - Current) Most recently, I’ve been working on courses and tools to help bridge data science knowledge gaps .
- My colleagues and I have written about:
- Data science teaching approaches [1].
- How to evaluate informatics software [2,3].
- Using AI chatbots for science [4] .
- I’ve also been a reviewer for:
- PLOS Computational Biology
- Journal of Statistics and Data Science Education
- (2018 - 2021) At the Childhood Cancer Data Lab, I helped equip childhood cancer researchers to use data science to empower their research [5–7].
- (2015 - 2018) In grad school, I worked on DNA methylation data in Parkinson’s disease [8].
- (2013 - 2015) I started out in neuroscience studying the transcriptomics of drug addiction [9–12].
PUBLICATIONS
1. Savonen C, Wright C, Hoffman A, Humphries E, Cox K, Tan F, et al. Motivation, inclusivity, and realism should drive data science education. F1000Research. 2024;12: 1240.
2. Afiaz A, Ivanov A, Chamberlin J, Hanauer D, Savonen C, Goldman MJ, et al. Evaluation of software impact designed for biomedical research: Are we measuring what’s meaningful? arXiv [cs.SE]. 2023. Available: http://arxiv.org/abs/2306.03255
3. Afiaz A, Ivanov AA, Chamberlin J, Hanauer D, Savonen CL, Goldman MJ, et al. Best practices to evaluate the impact of biomedical research software-metric collection beyond citations. Bioinformatics. 2024;40. doi:10.1093/bioinformatics/btae469
4. Humphries EM, Wright C, Hoffman AM, Savonen C, Leek JT. What’s the best chatbot for me? Researchers put LLMs through their paces. Nature. 2023 [cited 16 Oct 2024]. doi:10.1038/d41586-023-03023-4
5. Shapiro JA, Gaonkar KS, Savonen CL, Spielman SJ, Bethell CJ, Jin R, et al. OpenPBTA: An Open Pediatric Brain Tumor Atlas. 2022. doi:10.1101/2022.09.13.507832
6. Dang MT, Gonzalez MV, Gaonkar KS, Rathi KS, Young P, Arif S, et al. Macrophages in SHH subgroup medulloblastoma display dynamic heterogeneity that varies with treatment modality. Cell Reports. 2021;34: 108917.
7. Casey S. Greene, Dongbo Hu, Richard W. W. Jones, Stephanie Liu, David S. Mejia, Rob Patro, Stephen R. Piccolo, Ariel Rodriguez Romero, Hirak Sarkar, Candace L. Savonen, Jaclyn N. Taroni, William E. Vauclain, Deepashree Venkatesh Prasad, Kurt G. Wheeler. Refine.Bio. In: Refine.bio [Internet]. 2018 [cited 16 Oct 2024]. Available: https://www.refine.bio/
8. Kochmanski J, Savonen C, Bernstein AI. A Novel Application of Mixed Effects Models for Reconciling Base-Pair Resolution 5-Methylcytosine and 5-Hydroxymethylcytosine Data in Neuroepigenetics. Frontiers in Genetics. 2019;10. doi:10.3389/fgene.2019.00801
9. Saad MH, Savonen CL, Rumschlag M, Todi SV, Schmidt CJ, Bannon MJ. Opioid Deaths: Trends, Biomarkers, and Potential Drug Interactions Revealed by Decision Tree Analyses. Frontiers in Neuroscience. 2018;12: 728.
10. Saad MH, Rumschlag M, Guerra MH, Savonen CL, Jaster AM, Olson PD, et al. Differentially expressed gene networks, biomarkers, long noncoding RNAs, and shared responses with cocaine identified in the midbrains of human opioid abusers. Scientific Reports. 2019;9. doi:10.1038/s41598-018-38209-8
11. Bannon MJ, Savonen CL, Hartley ZJ, Johnson MM, Schmidt CJ. Investigating the potential influence of cause of death and cocaine levels on the differential expression of genes associated with cocaine abuse. PLoS One. 2015;10: e0117580.
12. Bannon MJ, Savonen CL, Jia H, Dachet F, Halter SD, Schmidt CJ, et al. Identification of long noncoding RNAs dysregulated in the midbrain of human cocaine abusers. J Neurochem. 2015;135: 50–59.
OTHER MEDIA
COMMITTEES
SOFTWARE
TEACHING
- Reproducibility tools, techniques and workflows in R for researchers.
github: https://github.com/fhdsl/Tools_for_Reproducible_Workflows_in_R
- ITCR Training Network courses
The ITN is a collaborative effort of researchers to catalyze informatics research through training opportunities (NCI UE5CA254170). ITN Courses are published on Leanpub and Coursera and on their own Bookdown websites.
Published ITN Courses:
- Documentation and Usability
A course to cover the basics of creating documentation and tutorials to maximize the usability of informatics tools.
github: https://github.com/jhudsl/Documentation_and_Usability
- Introduction to Reproducibility in Cancer Informatics
Equip learners with reproducibility skills they can apply to their existing analyses scripts and projects. This course opts for an “ease into it” approach.
github: https://github.com/jhudsl/Adv_Reproducibility_in_Cancer_Informatics
- Advanced Reproducibility in Cancer Informatics
To equip learners with a deeper knowledge of the capabilities of reproducibility tools and how they can apply to their existing analyses scripts and projects.
github: https://github.com/jhudsl/Reproducibility_in_Cancer_Informatics
To help learners find resources and tools to help them process and interpret their genomic data.
github: https://github.com/fhdsl/Choosing_Genomics_Tools
- GitHub Automation for Scientists
This course walks through why’s and the how’s for using automation to boost scientific software development process. It’s meant for folks who already have a basic familiarity with GitHub but would like to automate more of their software dev work.
github: https://github.com/fhdsl/GitHub_Automation_for_Scientists
- Containers for Scientists (currently under development)
This course walks through why’s and the how’s for using containers (e.g. Docker) to boost reproducibility of scientific analysis.
github:https://github.com/fhdsl/Containers_for_Scientists
A self-guided tutorial to help users analyze processed gene expression data from refine.bio repository.
github: https://github.com/AlexsLemonade/refinebio-examples
- Childhood Cancer Data Lab RNA-seq Workshops
A short format workshop (3 – 5 days) to introduce pediatric cancer researchers to the basics of single-cell and bulk RNA-seq data analysis.
github: https://github.com/AlexsLemonade/training-modules
PRESENTATIONS
2024
2023
2022
2021
2020
2019
- Childhood Cancer Data Lab workshop Bulk and single-cell RNA-seq, Philadelphia, PA
- Childhood Cancer Data Lab workshop Bulk and single-cell RNA-seq, San Francisco, CA
- Childhood Cancer Data Lab workshop Bulk and single-cell RNA-seq, Chicago, IL
- Childhood Cancer Data Lab workshop Bulk and single-cell RNA-seq, Houston, TX
POSTER PRESENTATIONS
- Differentially methylated gene networks in Parkinson’s disease
- (2018) International Society for Computational Biology Conference, Chicago, IL
- (2018) Michigan Chapter of Society for Neuroscience, Detroit, MI
- (2017) Society for Neuroscience, Washington, DC
- Microglia are more active in schizophrenia as evidenced by gene expression signatures
- (2017) Michigan Chapter of Society for Neuroscience, Ann Arbor, MI
- (2016) Michigan Chapter of Society for Neuroscience, East Lansing, MI
- Differential expression of lncRNAs in the ventral midbrain of cocaine abusers
- (2014) Michigan Chapter of Society for Neuroscience, Kalamazoo, MI
- Dopamine-cell specific expression of long non-coding RNAs in the ventral midbrain
- (2014) WSU Graduate Student Research Day, Detroit, MI
PERSONAL STATEMENT
My interest is in making data science tools more easily attainable to those who are looking to impactfully apply them to their areas of knowledge and background. I am passionate about creating educational materials which emphasize reproducibility and using scalable methods to disseminate educational materials. I believe that as a part of the data science community, we need to work to become a more inclusive work environment. This would not only create better science, but would widen the circle for individuals who are currently underrepresented in data science. I have been involved in creating and delivering bioinformatic education materials for cancer genomics. My neuroscience background has helped me empathize with researchers who are looking to bridge the data science knowledge gap.
Keywords
Bioinformatics, Data Science, Data Analysis, Education, Reproducibility, Gene Expression, Genomics