How the Ocean Health Index enables better science in less time
Julia Stewart Lowndes, PhD
Marine Data Scientist & Mozilla Fellow
National Center for Ecological Analysis
& Synthesis, UC Santa Barbara
slides: jules32.github.io
twitter: @juliesquid
April 15, 2019
Bren School Seminar
UC Santa Barbara
Allison Horst
Public
Allison Horst
Allison Horst
3
Public
Allison Horst
“Data science is the discipline of turning raw data into understanding”
Hadley Wickham
Statistician, Professor, Developer
Chief Scientist, RStudio �
“Open data science involves mindsets and skillsets emphasizing efficiency, reproducibility, transparency, collaboration, communication, and kindness”
Me
Outline
oceanhealthindex.org
Halpern et al. 2012
A scientific method, tool, and community for channeling the best available scientific information into marine policy.
Halpern et al. 2012
A scientific method, tool, and community for channeling the best available scientific information into marine policy.
Important because in marine management there is need for:
Global
Smaller scales
Assessments we lead
Assessments we enable
OHI+
OHI+
A healthy ocean sustainably delivers a range of benefits to people now and in the future.
Employment
Cultural Identity and Sense of Place
Food Provision
goals
OHI framework
A healthy ocean sustainably delivers a range of benefits to people now and in the future.
71
scores
models
data
inform
goals
scores
models
data
inform
Repeatable OHI assessment process
Combining datasets
Documenting decisions & methods
Asking for & incorporating feedback
Collaborating
Revising and comparing
Analyzing and summarizing data
Reevaluating past decisions
Critically evaluating results
Communicating methods and results
Planning
Designing figures
Gathering data
Reading the literature
OHI, aka modern science
ohi-science.org
@ohiscience
vs.
data_final_final.xls
Re:FWD: data question
scripts/species_count.R
Issue: species count
Allison Horst
How we work
connection to broader communities
open coding language
collaboration platform
docs, slides, sheets
Streamlined workflow
Figure adapted from Teucher 2018
Shared tools & practices
efficiency & reproducibility: coding and version control are the keystone
Shared tools & practices
but also for
game-changing
collaboration & communication
efficiency & reproducibility: coding and version control are the keystone
Shared practices for reproducibility
RStudio for R, text editing, Github sync, and more
Shared coding practices; convenient interface for coding and syncing
R code (scripts and console)
File navigation, help,
plots, packages
GitHub connection, environment, build
Github for archiving & bookkeeping
Convenient sharing with yourself and others
See what changed line-by-line
...and plot-by-plot
Github for discussion & project management
Individual
to do’s
Shared & archived
conversations
Project management and institutional memory
R + Github for documentation & communication
Website
OHI-Science.org
Protocols & methods
*made with RMarkdown*
Hands-on Training Books
R + Github for publication
Interactive
applications
*made with shiny*
ohi-science.org/ohi-global
Shared tools & practices
Our workflow is more streamlined; efficient onboarding & offboarding
Enabling better science in less time
Learning with online communities
https://blog.mozilla.org
openscapes.org
@openscapes
Empower
Amplify
Engage
Allison Horst
We champion open practices to help uncover data-driven solutions faster.
Build champions and communities
Build confidence and skills
Build awareness and excitement
Openscapes Champions
We help Champions & their labs:
Mentorship program that empowers environmental scientists
with open data science tools and grows the community of practice
Openscapes Champions
Lessons based from Lowndes et al. 2017
Early lesson:
Openscapes
Progress and what’s next:
What can you do to engage with open data science?
Allison Horst
What can you do to engage with open data science?
- Talk about your data challenges with colleagues
- Share your next presentation online
- Use Twitter for science
- Follow selectively, listen &
learn (e.g. #rstats, @nceas)
1. Promote/enable the culture of open data science – even if you don’t code
2. Create/join communities, locally & online
3. Use existing online resources to learn & skillshare
4. Ask for open data science skills to be formally taught
How about TODAY:
So what can you do?
A few of UCSB’s many learning communities
eco-data-science.github.io
@ecodatasci
meetup.com/
rladies-santa-barbara
@RLadiesSB
#TidyTuesday Hacky Hours
ESM 206 & 244
library.ucsb.edu/
software-carpentry
NCEAS opportunities: internships, postdocs, research scientists:
nceas.ucsb.edu/employment
We can get to environmental solutions faster
if we are more
efficient & collaborative
with how we do science.
Let’s do
better science
in less time together.
Julia Stewart Lowndes, PhD
http://jules32.github.io
lowndes@nceas.ucsb.edu
@juliesquid
openscapes.org ohi-science.org
@openscapes @ohi-science
slides: jules32.github.io
Thank You
Extra slides
Using twitter for science
My internal monologue:
Using twitter for science
Being open with your science
Twitter to learn and connect
Shared practices for reproducibility
Data wrangling: up to 50–80% of a data scientist’s time Lohr 2014
Tidy data
What if you needed D. opalescens 2017?
Untidy :(
Tidy !!
Good for data entry, not good for data analysis because:
Great for data analysis because
Species | 2016 | 2017 |
D. gigas | 398 | 139 |
D. opalescens | 663 | 447 |
O. rubescens | 423 | 739 |
species | year | count |
D. gigas | 2016 | 398 |
D. gigas | 2017 | 139 |
D. opalescens | 2016 | 663 |
D. opalescens | 2017 | 447 |
O. rubescens | 2016 | 423 |
O. rubescens | 2017 | 739 |
Tidy data
Examples from tidyr
tidyr::gather()
separate()
gather()
Our Ocean Health Index story
Ocean management is complex
Need for science- and data-driven methods to measure what people care about
Need for standardized but flexible methods to assess different geographies
Need to streamline assessments from year-to-year to track change through time