The power of open science: experience from the Ocean Health Index
Julia Stewart Lowndes, PhD
National Center for Ecological Analysis and Synthesis
University of California at Santa Barbara, USA
lowndes@nceas.ucsb.edu
juliesquid jules32
jules32.github.io
February 27, 2018
SAFRED Keynote
Royal Belgian Institute of Natural Sciences
Brussels, Belgium
data
^
The power of open data science
BEFORE
Figure adapted from Teucher 2018
FWD:Re:data question
*_v1.xls
*_v9_JL.xls
*_final3_JL.doc
The power of open data science
AFTER
BEFORE
(synced)
FWD:Re:data question
*_v1.xls
*_v9_JL.xls
*.R
Figure adapted from Teucher 2018
*_final3_JL.doc
Data analysis can be inefficient and demoralizing
when you’re alone without the right tools and skills
But open data science tools and community exist
that are powerful and empowering
The biggest obstacles to using these tools can be exposure to them and confidence in yourself
3 things that will empower you
and improve your science:
1. Data science is a discipline
3 things that will empower you
and improve your science:
1. Data science is a discipline
2. Open data science tools exist
3 things that will empower you
and improve your science:
1. Data science is a discipline
2. Open data science tools exist
3. Learn with collaborators and community (redefined)
3 things that will empower you
and improve your science:
1. Data science is a discipline
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
There are concepts, theory, and tools for thinking about and working with data
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
Not just for “big data”
There are concepts, theory, and tools for thinking about and working with data
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
Your study system is not unique when it comes to data
Not just for “big data”
There are concepts, theory, and tools for thinking about and working with data
1. Data science is a discipline
“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016
1. Data science is a discipline
Think deliberately about data
Distinguish data questions from research questions, learn how to ask for help
Why this matters for science
1. Data science is a discipline
Think deliberately about data
Distinguish data questions from research questions, learn how to ask for help
Save heartache
You don’t have to reinvent the wheel
Why this matters for science
1. Data science is a discipline
Think deliberately about data
Distinguish data questions from research questions, learn how to ask for help
Save heartache
You don’t have to reinvent the wheel
Save time
Expect there’s a better way to do what you are doing
Focus on the science
Why this matters for science
2. Open data science tools exist
2. Open data science tools exist
“data science tools that enable open science” Lowndes et al. 2017
2. Open data science tools exist
“data science tools that enable open science” Lowndes et al. 2017
“the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015
2. Open data science tools exist
“data science tools that enable open science” Lowndes et al. 2017
“the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015
These tools are also game-changing for collaboration and communication
2. Open data science tools exist
“data science tools that enable open science” Lowndes et al. 2017
And they are developed by nice people
“the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015
These tools are also game-changing for collaboration and communication
2. Open data science tools exist
Tools – some examples:
open coding language with shared practices
open version control and shared bookkeeping/organization
free online display/distribution
2. Open data science tools exist
Tools to match data science theory:
Wickham 2017
www.tidyverse.org
2. Open data science tools exist
Tools to match data science theory:
Wickham 2017
Person 1
Person 2
2. Open data science tools exist
Confidence in your analyses
Traceable, reusable record
Why this matters for science
2. Open data science tools exist
Confidence in your analyses
Traceable, reusable record
Save time
Automation; think ahead of your immediate task; bookkeeping; collaboration
Why this matters for science
2. Open data science tools exist
Confidence in your analyses
Traceable, reusable record
Save time
Automation; think ahead of your immediate task; bookkeeping; collaboration
Convenient access
Work openly online (extended memory)
Why this matters for science
3. Learn with collaborators and community (redefined)
3. Learn with collaborators and community (redefined)
helps overcome isolation, self-taught bad practices, apprehension Stevens et al. 2018
3. Learn with collaborators and community (redefined)
Beyond the colleagues in your field
3. Learn with collaborators and community (redefined)
Your most important collaborator is Future You
Beyond the colleagues in your field
3. Learn with collaborators and community (redefined)
Your most important collaborator is Future You
– from –
Learn – with – others
– for –
Beyond the colleagues in your field
3. Learn with collaborators and community (redefined)
3. Learn with collaborators and community (redefined)
Learn to talk about your data
Find solutions faster
Why this matters for science
3. Learn with collaborators and community (redefined)
Learn to talk about your data
Find solutions faster
Build confidence
Skills are transferable beyond your science
Why this matters for science
3. Learn with collaborators and community (redefined)
Learn to talk about your data
Find solutions faster
Build confidence
Skills are transferable beyond your science
Empathy and inclusion
Build a network of allies
Why this matters for science
Our story
Our story
-
+
then
2013
2nd global study
Lowndes et al. 2017 Nature Ecology & Evolution
Our path to better science in less time
using open data science tools
2012
OHI published;
1st global study
Halpern et al. Nature
Today
transparent and repeatable workflow,
7th global study underway,
20+ independent studies underway
Our story
1. Data science is a discipline
3. Learn with collaborators and community (redefined)
2. Open data science tools exist
-
+
then
Lowndes et al. 2017 Nature Ecology & Evolution
Our path to better science in less time
using open data science tools
2013
2nd global study
2012
OHI published;
1st global study
Halpern et al. Nature
Today
transparent and repeatable workflow,
7th global study underway,
20+ independent studies underway
1. Ocean Health Index embraces data science principles
Data preparation:
wrangling, formatting and other
tasks that can take 50–80% of a
data scientist’s time
1. Ocean Health Index embraces data science principles
Tidy data:
Data preparation:
wrangling, formatting and other
tasks that can take 50–80% of a
data scientist’s time
2. Ocean Health Index uses existing open data science tools
Figure adapted from Teucher 2018
2. Ocean Health Index uses existing open data science tools
2. Ocean Health Index uses existing open data science tools
3. OHI engages with collaborators and community (redefined)
3. OHI engages with collaborators and community (redefined)
OHI independent studies:
Planning
Analysis
Complete
1, 2, 3: All together now
Ocean Health Index analyses are repeated each year and are faster, easier, cheaper because we:
Why this matters for science
1, 2, 3: All together now
Ocean Health Index analyses are repeated each year and are faster, easier, cheaper because we:
Why this matters for science
How did I get into open data science? (this stuff intimidated me)
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
Doing.
How did I get into open data science? (this stuff intimidated me)
A welcoming community.
I joined Twitter to learn #rstats
Doing.
Teaching/skill-sharing. (creating communities)
Open data science practices made me a better scientist
Broader thinking: I think differently about what scientific questions I can ask.
Empathy: I remember what it’s like to learn something new.
Confidence: I’ve gained confidence in myself in a transferable way.
Open data science practices made me a better scientist
Broader thinking: I think differently about what scientific questions I can ask.
Empathy: I remember what it’s like to learn something new.
Confidence: I’ve gained confidence in myself in a transferable way.
-----
I feel so strongly about collaborative open data science that I’ve been moving away from my own research and towards enabling other scientists
So what can you do?
Promote and enable the culture of open data science
– even if you don’t code
So what can you do?
Promote and enable the culture of open data science
– even if you don’t code
Create/join community, locally and online
So what can you do?
Promote and enable the culture of open data science
– even if you don’t code
Create/join community, locally and online
Use existing online resources to learn and skillshare
The power of collaborative open data science
To empower you and improve your science:
1. Data science is a discipline
2. Open data science tools exist
3. Learn with collaborators and community (redefined)
THANK YOU!
Julia Stewart Lowndes, PhD
lowndes@nceas.ucsb.edu
juliesquid jules32
jules32.github.io
February 27, 2018
SAFRED Keynote
Royal Belgian Institute of Natural Sciences
Brussels, Belgium
Extra slides
(to add for a longer talk)
2. Ocean Health Index uses existing open data science tools
Shared coding practices; convenient interface for coding and syncing
R code (scripts and console)
File navigation, help,
plots, packages
GitHub connection, environment, build
2. Ocean Health Index uses existing open data science tools
See what changed line-by-line
...and plot by plot
Sharing code and bookkeeping; convenient interface for collaborating
2. Ocean Health Index uses existing open data science tools
Working openly GitHub.com
Website
OHI-Science.org
Hands-on Training Books
1. Ocean Health Index embraces data science principles
RStudio projects as GitHub repos
RMarkdown combines code & text...
2. Open data science tools exist
Tools to make it efficient to collaborate – example workflow
1. Data in Google Sheets
2. Data import with googlesheets package
3. Coded analysis, with shared practices, version control
4. Sync online (versioned code)
5. Available online for reference or use
2. Open data science tools exist
Tools to make it efficient to collaborate – example workflow
1. Data is in a PDF
2. Data import with tabulizer package
3. Coded analysis, with shared practices, version control
4. Sync online (versioned code)
5. Available online for reference or use
1. Ocean Health Index embraces data science principles
RMarkdown displayed as webpages or websites
Working openly online - for science and communication: github.com/ohi-science
Global assessment!
Manuscript website!
Training e-book!
OHI+ assessment!
Science website!
Collaborators!
Analyses – R code and text together (R Markdown)
Science website: ohi-science.org
Interactive websites for published articles:
http://ohi-science.nceas.ucsb.edu/plos_change_in_global_ocean_health
1. Ocean Health Index embraces data science principles
www.rstudio.com/resources/cheatsheets
1. Ocean Health Index embraces data science principles
Open science is the biggest opportunity for scientific advancement
Why we do science: To make a difference
Data skills are an unmet need
survey of 704 US National Science Foundation principal investigators in the biological sciences, which found training in data skills to be the largest unmet need
Not just for “big data”: a lot of this is good practices for bookkeeping, organization, and remembering what you did. Important even if you only have one collaborator at the moment – FUTURE YOU
1. Ocean Health Index embraces data science principles
Communication:
Websites, books, and more
File naming:
machine readable,
human readable,
play well with default ordering