The power of open science: experience from the Ocean Health Index
Julia Stewart Lowndes, PhD
Mozilla Fellow
National Center for Ecological Analysis
& Synthesis, UC Santa Barbara
slides: jules32.github.io
twitter: @juliesquid
October 24, 2018
Woods Hole Oceanographic Institution
Alternative title:
Let’s do science better together
Agenda
Data analysis can be inefficient and demoralizing
when you’re without the right tools/skills and you feel alone
Allison Horst
Public
But open tools, practices, and communities exist
that are powerful and empowering
Allison Horst
Allison Horst
4
Public
The biggest obstacles to using data science tools
can be exposure to them and confidence in yourself
And we can learn open practices for science
Allison Horst
Our Ocean Health Index story
Ocean management is complex
Need for science- and data-driven methods to measure what people care about
Need for standardized but flexible methods to assess different geographies
Need to streamline assessments from year-to-year to track change through time
Our Ocean Health Index story
-
+
time
2012
OHI published;
1st global study
Halpern et al. Nature
Our Ocean Health Index story
-
+
time
2012
OHI published;
1st global study
Halpern et al. Nature
Repeatability was a priority
Our Ocean Health Index story
-
+
time
2013
2nd global study
2012
OHI published;
1st global study
Halpern et al. Nature
Our Ocean Health Index story
-
+
time
2013
2nd global study
2012
OHI published;
1st global study
Halpern et al. Nature
Our approaches were inadequate to efficiently reproduce our own work – largely because of data prep & collaboration
data_final_final.xls
Re: FWD: data question
Our Ocean Health Index story
-
+
time
2013
2nd global study
Lowndes et al. 2017 Nature Ecology & Evolution
Our path to better science in less time
using open data science tools
2012
OHI published;
1st global study
Halpern et al. Nature
Today
transparent and repeatable workflow,
7th global study underway,
20+ independent studies
Our Ocean Health Index story
-
+
time
2013
2nd global study
Lowndes et al. 2017 Nature Ecology & Evolution
Our path to better science in less time
using open data science tools
2012
OHI published;
1st global study
Halpern et al. Nature
Today
transparent and repeatable workflow,
7th global study underway,
20+ independent studies
We learned:
Reproducibility in science requires reproducibility with data methods
Our Ocean Health Index story
-
+
time
2013
2nd global study
Lowndes et al. 2017 Nature Ecology & Evolution
Our path to better science in less time
using open data science tools
2012
OHI published;
1st global study
Halpern et al. Nature
Today
transparent and repeatable workflow,
7th global study underway,
20+ independent studies
Figure adapted from Teucher 2018
Our Ocean Health Index story
-
+
Lowndes et al. 2017 Nature Ecology & Evolution
Our path to better science in less time
using open data science tools
2013
2nd global study
2012
OHI published;
1st global study
Halpern et al. Nature
Today
transparent and repeatable workflow,
7th global study underway,
20+ independent studies
time
3 things were
game-changing for us.
And they will be for you too...
1. Data science is a discipline
2. Open data science tools exist
3. Learn with collaborators and community (redefined)
3 things that will empower you
and improve your science:
1. Data science is a discipline
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
Tidy your data first. Then, ask your research questions
1. Data science is a discipline
Think deliberately about data
Distinguish data questions from research questions, learn how to ask for help
Save heartache
You don’t have to reinvent the wheel
Save time
Expect there’s a better way to do what you are doing
Focus on the science
Why this matters for science
1. Data science is a discipline
Tidy data first
don’t accommodate messy data,
instead, analyze tidy data
Emphasize documentation and communication throughout
use the same tools & practices to
publish websites & distribute resources
Have deliberate shared practices
for data structure, filenaming, and
organization
Focus on data preparation
wrangling, formatting and other
tasks that can take 50–80% of a
data scientist’s time
2. Open data science tools exist
2. Open data science tools exist
“data science tools that enable open science” Lowndes et al. 2017
Open science: “the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015
2. Open data science tools exist
Tools – some examples:
open coding language with shared practices
2. Open data science tools exist
Tools – some examples:
open coding language with shared practices
shared bookkeeping/organization
free online display/distribution
2. Open data science tools exist
Tools to match data science theory:
graphic: Wickham 2017
www.tidyverse.org
2. Open data science tools exist
Tools to match data science theory:
Wickham 2017
Person 1
Person 2
2. Open data science tools exist
Confidence in your analyses
Traceable, reusable record
Save time
Automation; think ahead of your immediate task; bookkeeping; collaboration; embrace existing structure for organization
Convenient access
Work openly online (extended memory)
Why this matters for science
2. Open data science tools exist
Figure adapted from Teucher 2018
2. Open data science tools exist
coding and
version control are the keystone
2. Open data science tools exist
but also effective collaboration and communication
2. Open data science tools exist
Shared coding practices; convenient interface for coding and syncing
R code (scripts and console)
File navigation, help,
plots, packages
GitHub connection, environment, build
2. Open data science tools exist
See what changed line-by-line
...and plot-by-plot
Sharing code and bookkeeping; convenient interface for collaborating
2. Open data science tools exist
Working openly GitHub.com
Website
OHI-Science.org
Hands-on Training Books
2. Open data science tools exist
Our workflow is more streamlined; efficient onboarding & offboarding
Live Demo!
3. Learn with collaborators and community (redefined)
3. Learn with collaborators and community (redefined)
helps overcome isolation, self-taught bad practices, apprehension Stevens et al. 2018
3. Learn with collaborators and community (redefined)
3. Learn with collaborators and community (redefined)
Learn to talk about your data
Find solutions faster
Build confidence
Skills are transferable beyond your science
Be empathic and inclusive
Build a network of allies
Why this matters for science
3. Learn with collaborators and community (redefined)
Planning Analysis Complete
1, 2, 3: All together now
Ocean Health Index analyses are transparent and reproducible.
We repeat them each year and it’s faster, easier, cheaper.
1, 2, 3: All together now
Training is a priority.
How did I get into open data science? (this stuff intimidated me)
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
Doing.
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
Doing.
Teaching/skill-sharing. (creating communities)
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
Doing.
Teaching/skill-sharing. (creating communities)
**I use the internet for everything**
Open data science practices made me a better scientist
Broader thinking: I think differently about what scientific questions I can ask.
Empathy: I remember what it’s like to learn something new.
Confidence: I’ve gained confidence in myself in a transferable way.
Open data science practices made me a better scientist
Broader thinking: I think differently about what scientific questions I can ask.
Empathy: I remember what it’s like to learn something new.
Confidence: I’ve gained confidence in myself in a transferable way.
-----
I feel so strongly about collaborative open data science that I’ve been moving away from my own research and towards enabling other environmental scientists so that their work can be most impactful
www.mozilla.org
www.dataone.org/previous-webinars/2018
https://blog.mozilla.org
2. Empower
3. Amplify
Build champions and communities
Build confidence and skills
Build awareness and excitement
Openscapes
Opening the landscapes of environmental science
through open community, data, and code
So what can you do?
Promote/enable the culture of open science
– even if you don’t code
Create/join community, locally & online
Use existing online resources to learn & skillshare
Learn with intention – not in a panic, beyond a single purpose
**Coming soon: openscapes.org**
twitter: @openscapes
So what can you do TODAY?
Engage online, use Twitter for science
Schedule lab/group “seaside chats”
Share your next presentation online
Talk about your data challenges with colleagues
1. Data science is a discipline
2. Open data science tools exist
3. Learn with collaborators and community (redefined)
To empower you and improve your science:
To empower you and improve your science:
1. Data science is a discipline
2. Open data science tools exist
3. Learn with collaborators and community (redefined)
Julia Stewart Lowndes, PhD
Mozilla Fellow
NCEAS, UC Santa Barbara
----
slides: jules32.github.io
twitter: @juliesquid
October 24, 2018
Woods Hole Oceanographic Institution
ohi-science.org
Thank you!!
Uncertainty and data gaps
Marine species distributions
Global fisheries modeling
Public priorities
3. Learn with collaborators and community (redefined)
Afflerbach et al. in review
2. Open data science tools exist
Data prep displayed as webpages:
Themes
-community
-outline emphasize for everyone
Moz, open practices
Github+R: not just for code anymore
Use twitter