1 of 62

The power of open science: experience from the Ocean Health Index

Julia Stewart Lowndes, PhD

Mozilla Fellow

National Center for Ecological Analysis

& Synthesis, UC Santa Barbara

slides: jules32.github.io

twitter: @juliesquid

October 24, 2018

Woods Hole Oceanographic Institution

2 of 62

Alternative title:

Let’s do science better together

Agenda

  1. Ocean Health Index
  2. Three things to empower you and improve your science
  3. Mozilla
  4. What you can do

3 of 62

Data analysis can be inefficient and demoralizing

when you’re without the right tools/skills and you feel alone

Allison Horst

Public

4 of 62

But open tools, practices, and communities exist

that are powerful and empowering

Allison Horst

Allison Horst

4

Public

5 of 62

The biggest obstacles to using data science tools

can be exposure to them and confidence in yourself

And we can learn open practices for science

Allison Horst

6 of 62

Our Ocean Health Index story

Ocean management is complex

Need for science- and data-driven methods to measure what people care about

Need for standardized but flexible methods to assess different geographies

Need to streamline assessments from year-to-year to track change through time

7 of 62

Our Ocean Health Index story

-

+

time

2012

OHI published;

1st global study

Halpern et al. Nature

8 of 62

Our Ocean Health Index story

-

+

time

2012

OHI published;

1st global study

Halpern et al. Nature

Repeatability was a priority

  • Detailed notes on data processing
  • Coded models
  • Published 130 pages of SOM
  • Shared modeled data on FTP

9 of 62

Our Ocean Health Index story

-

+

time

2013

2nd global study

2012

OHI published;

1st global study

Halpern et al. Nature

10 of 62

Our Ocean Health Index story

-

+

time

2013

2nd global study

2012

OHI published;

1st global study

Halpern et al. Nature

Our approaches were inadequate to efficiently reproduce our own work – largely because of data prep & collaboration

data_final_final.xls

Re: FWD: data question

11 of 62

Our Ocean Health Index story

-

+

time

2013

2nd global study

Lowndes et al. 2017 Nature Ecology & Evolution

Our path to better science in less time

using open data science tools

2012

OHI published;

1st global study

Halpern et al. Nature

Today

transparent and repeatable workflow,

7th global study underway,

20+ independent studies

12 of 62

Our Ocean Health Index story

-

+

time

2013

2nd global study

Lowndes et al. 2017 Nature Ecology & Evolution

Our path to better science in less time

using open data science tools

2012

OHI published;

1st global study

Halpern et al. Nature

Today

transparent and repeatable workflow,

7th global study underway,

20+ independent studies

We learned:

Reproducibility in science requires reproducibility with data methods

13 of 62

Our Ocean Health Index story

-

+

time

2013

2nd global study

Lowndes et al. 2017 Nature Ecology & Evolution

Our path to better science in less time

using open data science tools

2012

OHI published;

1st global study

Halpern et al. Nature

Today

transparent and repeatable workflow,

7th global study underway,

20+ independent studies

Figure adapted from Teucher 2018

14 of 62

Our Ocean Health Index story

-

+

Lowndes et al. 2017 Nature Ecology & Evolution

Our path to better science in less time

using open data science tools

2013

2nd global study

2012

OHI published;

1st global study

Halpern et al. Nature

Today

transparent and repeatable workflow,

7th global study underway,

20+ independent studies

time

3 things were

game-changing for us.

And they will be for you too...

15 of 62

1. Data science is a discipline

2. Open data science tools exist

3. Learn with collaborators and community (redefined)

3 things that will empower you

and improve your science:

16 of 62

1. Data science is a discipline

17 of 62

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

  • There are concepts, theory, and tools for thinking about and working with data
  • Emphasis on communication
  • Not just for “big data”
  • Your study system is not unique when it comes to data

18 of 62

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

19 of 62

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

Tidy your data first. Then, ask your research questions

20 of 62

1. Data science is a discipline

Think deliberately about data

Distinguish data questions from research questions, learn how to ask for help

Save heartache

You don’t have to reinvent the wheel

Save time

Expect there’s a better way to do what you are doing

Focus on the science

Why this matters for science

21 of 62

1. Data science is a discipline

Tidy data first

don’t accommodate messy data,

instead, analyze tidy data

Emphasize documentation and communication throughout

use the same tools & practices to

publish websites & distribute resources

Have deliberate shared practices

for data structure, filenaming, and

organization

Focus on data preparation

wrangling, formatting and other

tasks that can take 50–80% of a

data scientist’s time

22 of 62

2. Open data science tools exist

23 of 62

2. Open data science tools exist

“data science tools that enable open science” Lowndes et al. 2017

Open science: “the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015

  • These tools are also game-changing for collaboration and communication
  • And they are developed by nice people

24 of 62

2. Open data science tools exist

Tools – some examples:

open coding language with shared practices

25 of 62

2. Open data science tools exist

Tools – some examples:

open coding language with shared practices

shared bookkeeping/organization

free online display/distribution

26 of 62

2. Open data science tools exist

Tools to match data science theory:

graphic: Wickham 2017

www.tidyverse.org

27 of 62

2. Open data science tools exist

Tools to match data science theory:

Wickham 2017

Person 1

Person 2

28 of 62

2. Open data science tools exist

Confidence in your analyses

Traceable, reusable record

Save time

Automation; think ahead of your immediate task; bookkeeping; collaboration; embrace existing structure for organization

Convenient access

Work openly online (extended memory)

Why this matters for science

29 of 62

2. Open data science tools exist

Figure adapted from Teucher 2018

30 of 62

2. Open data science tools exist

coding and

version control are the keystone

31 of 62

2. Open data science tools exist

but also effective collaboration and communication

32 of 62

2. Open data science tools exist

Shared coding practices; convenient interface for coding and syncing

R code (scripts and console)

File navigation, help,

plots, packages

GitHub connection, environment, build

33 of 62

2. Open data science tools exist

See what changed line-by-line

...and plot-by-plot

Sharing code and bookkeeping; convenient interface for collaborating

34 of 62

2. Open data science tools exist

Working openly GitHub.com

Website

OHI-Science.org

Hands-on Training Books

35 of 62

2. Open data science tools exist

Our workflow is more streamlined; efficient onboarding & offboarding

36 of 62

Live Demo!

37 of 62

3. Learn with collaborators and community (redefined)

38 of 62

3. Learn with collaborators and community (redefined)

helps overcome isolation, self-taught bad practices, apprehension Stevens et al. 2018

  • Your most important collaborator is Future You
  • Communities beyond the colleagues in your field
  • Learn from, with, & for others

39 of 62

3. Learn with collaborators and community (redefined)

40 of 62

3. Learn with collaborators and community (redefined)

Learn to talk about your data

Find solutions faster

Build confidence

Skills are transferable beyond your science

Be empathic and inclusive

Build a network of allies

Why this matters for science

41 of 62

3. Learn with collaborators and community (redefined)

Planning Analysis Complete

42 of 62

1, 2, 3: All together now

Ocean Health Index analyses are transparent and reproducible.

We repeat them each year and it’s faster, easier, cheaper.

43 of 62

1, 2, 3: All together now

Training is a priority.

  • OHI Fellows
  • The Carpentries
  • NCEAS Trainings

  • Intro to Open Data Science
    • OHI-Science YouTube

44 of 62

How did I get into open data science? (this stuff intimidated me)

45 of 62

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

46 of 62

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

Doing.

  • Trying things, doing tutorials, Googling errors
  • Websites/blogs/tutorials/slides as extended memory (jules32.github.io, ohi-science.org)

47 of 62

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

Doing.

  • Trying things, doing tutorials, Googling errors
  • Websites/blogs/tutorials/slides as extended memory (jules32.github.io, ohi-science.org)

Teaching/skill-sharing. (creating communities)

  • Seaside chats (lab)
  • Eco-Data-Science Study Group (UCSB)
  • Software/Data Carpentries (global)
  • Ocean Health Index (global)
  • RLadies (global)

48 of 62

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

Doing.

  • Trying things, doing tutorials, Googling errors
  • Websites/blogs/tutorials/slides as extended memory (jules32.github.io, ohi-science.org)

Teaching/skill-sharing. (creating communities)

  • Seaside chats (lab)
  • Eco-Data-Science Study Group (UCSB)
  • Software/Data Carpentries (global)
  • Ocean Health Index (global)
  • RLadies (global)

**I use the internet for everything**

49 of 62

Open data science practices made me a better scientist

Broader thinking: I think differently about what scientific questions I can ask.

Empathy: I remember what it’s like to learn something new.

Confidence: I’ve gained confidence in myself in a transferable way.

50 of 62

Open data science practices made me a better scientist

Broader thinking: I think differently about what scientific questions I can ask.

Empathy: I remember what it’s like to learn something new.

Confidence: I’ve gained confidence in myself in a transferable way.

-----

I feel so strongly about collaborative open data science that I’ve been moving away from my own research and towards enabling other environmental scientists so that their work can be most impactful

51 of 62

www.mozilla.org

52 of 62

www.dataone.org/previous-webinars/2018

53 of 62

https://blog.mozilla.org

54 of 62

2. Empower

3. Amplify

  1. Engage

Build champions and communities

Build confidence and skills

Build awareness and excitement

Openscapes

Opening the landscapes of environmental science

through open community, data, and code

55 of 62

So what can you do?

Promote/enable the culture of open science

– even if you don’t code

Create/join community, locally & online

Use existing online resources to learn & skillshare

Learn with intention – not in a panic, beyond a single purpose

**Coming soon: openscapes.org**

twitter: @openscapes

ohi-science.org/betterscienceinlesstime

56 of 62

So what can you do TODAY?

Engage online, use Twitter for science

  • Follow selectively, listen & learn (e.g. #rstats, @nceas)

Schedule lab/group “seaside chats”

Share your next presentation online

Talk about your data challenges with colleagues

57 of 62

1. Data science is a discipline

2. Open data science tools exist

3. Learn with collaborators and community (redefined)

To empower you and improve your science:

58 of 62

To empower you and improve your science:

1. Data science is a discipline

2. Open data science tools exist

3. Learn with collaborators and community (redefined)

Julia Stewart Lowndes, PhD

Mozilla Fellow

NCEAS, UC Santa Barbara

----

slides: jules32.github.io

twitter: @juliesquid

October 24, 2018

Woods Hole Oceanographic Institution

ohi-science.org

Thank you!!

59 of 62

60 of 62

Uncertainty and data gaps

Marine species distributions

Global fisheries modeling

Public priorities

3. Learn with collaborators and community (redefined)

Afflerbach et al. in review

61 of 62

2. Open data science tools exist

Data prep displayed as webpages:

62 of 62

Themes

-community

-outline emphasize for everyone

Moz, open practices

Github+R: not just for code anymore

Use twitter