1 of 82

The power of open science: experience from the Ocean Health Index

Julia Stewart Lowndes, PhD

National Center for Ecological Analysis and Synthesis

University of California at Santa Barbara, USA

lowndes@nceas.ucsb.edu

juliesquid jules32

jules32.github.io

February 27, 2018

SAFRED Keynote

Royal Belgian Institute of Natural Sciences

Brussels, Belgium

data

^

2 of 82

The power of open data science

BEFORE

Figure adapted from Teucher 2018

FWD:Re:data question

*_v1.xls

*_v9_JL.xls

*_final3_JL.doc

3 of 82

The power of open data science

AFTER

BEFORE

(synced)

FWD:Re:data question

*_v1.xls

*_v9_JL.xls

*.R

Figure adapted from Teucher 2018

*_final3_JL.doc

4 of 82

Data analysis can be inefficient and demoralizing

when you’re alone without the right tools and skills

5 of 82

But open data science tools and community exist

that are powerful and empowering

6 of 82

The biggest obstacles to using these tools can be exposure to them and confidence in yourself

7 of 82

3 things that will empower you

and improve your science:

8 of 82

1. Data science is a discipline

3 things that will empower you

and improve your science:

9 of 82

1. Data science is a discipline

2. Open data science tools exist

3 things that will empower you

and improve your science:

10 of 82

1. Data science is a discipline

2. Open data science tools exist

3. Learn with collaborators and community (redefined)

3 things that will empower you

and improve your science:

11 of 82

1. Data science is a discipline

12 of 82

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

13 of 82

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

There are concepts, theory, and tools for thinking about and working with data

14 of 82

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

Not just for “big data”

There are concepts, theory, and tools for thinking about and working with data

15 of 82

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

Your study system is not unique when it comes to data

Not just for “big data”

There are concepts, theory, and tools for thinking about and working with data

16 of 82

1. Data science is a discipline

“the discipline of turning raw data into understanding.” Wickham & Grolemund 2016

17 of 82

1. Data science is a discipline

Think deliberately about data

Distinguish data questions from research questions, learn how to ask for help

Why this matters for science

18 of 82

1. Data science is a discipline

Think deliberately about data

Distinguish data questions from research questions, learn how to ask for help

Save heartache

You don’t have to reinvent the wheel

Why this matters for science

19 of 82

1. Data science is a discipline

Think deliberately about data

Distinguish data questions from research questions, learn how to ask for help

Save heartache

You don’t have to reinvent the wheel

Save time

Expect there’s a better way to do what you are doing

Focus on the science

Why this matters for science

20 of 82

2. Open data science tools exist

21 of 82

2. Open data science tools exist

“data science tools that enable open science Lowndes et al. 2017

22 of 82

2. Open data science tools exist

“data science tools that enable open science Lowndes et al. 2017

“the concept of trans­parency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015

23 of 82

2. Open data science tools exist

“data science tools that enable open science Lowndes et al. 2017

“the concept of trans­parency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015

These tools are also game-changing for collaboration and communication

24 of 82

2. Open data science tools exist

“data science tools that enable open science Lowndes et al. 2017

And they are developed by nice people

“the concept of trans­parency at all stages of the research process, coupled with free and open access to data, code, and papers” Hampton et al. 2015

These tools are also game-changing for collaboration and communication

25 of 82

2. Open data science tools exist

Tools – some examples:

open coding language with shared practices

open version control and shared bookkeeping/organization

free online display/distribution

26 of 82

2. Open data science tools exist

Tools to match data science theory:

Wickham 2017

www.tidyverse.org

27 of 82

2. Open data science tools exist

Tools to match data science theory:

Wickham 2017

Person 1

Person 2

28 of 82

2. Open data science tools exist

Confidence in your analyses

Traceable, reusable record

Why this matters for science

29 of 82

2. Open data science tools exist

Confidence in your analyses

Traceable, reusable record

Save time

Automation; think ahead of your immediate task; bookkeeping; collaboration

Why this matters for science

30 of 82

2. Open data science tools exist

Confidence in your analyses

Traceable, reusable record

Save time

Automation; think ahead of your immediate task; bookkeeping; collaboration

Convenient access

Work openly online (extended memory)

Why this matters for science

31 of 82

3. Learn with collaborators and community (redefined)

32 of 82

3. Learn with collaborators and community (redefined)

helps overcome isolation, self-taught bad practices, apprehension Stevens et al. 2018

33 of 82

3. Learn with collaborators and community (redefined)

Beyond the colleagues in your field

34 of 82

3. Learn with collaborators and community (redefined)

Your most important collaborator is Future You

Beyond the colleagues in your field

35 of 82

3. Learn with collaborators and community (redefined)

Your most important collaborator is Future You

– from –

Learn – with – others

– for –

Beyond the colleagues in your field

36 of 82

3. Learn with collaborators and community (redefined)

37 of 82

3. Learn with collaborators and community (redefined)

Learn to talk about your data

Find solutions faster

Why this matters for science

38 of 82

3. Learn with collaborators and community (redefined)

Learn to talk about your data

Find solutions faster

Build confidence

Skills are transferable beyond your science

Why this matters for science

39 of 82

3. Learn with collaborators and community (redefined)

Learn to talk about your data

Find solutions faster

Build confidence

Skills are transferable beyond your science

Empathy and inclusion

Build a network of allies

Why this matters for science

40 of 82

Our story

41 of 82

Our story

-

+

then

2013

2nd global study

Lowndes et al. 2017 Nature Ecology & Evolution

Our path to better science in less time

using open data science tools

2012

OHI published;

1st global study

Halpern et al. Nature

Today

transparent and repeatable workflow,

7th global study underway,

20+ independent studies underway

42 of 82

Our story

1. Data science is a discipline

3. Learn with collaborators and community (redefined)

2. Open data science tools exist

-

+

then

Lowndes et al. 2017 Nature Ecology & Evolution

Our path to better science in less time

using open data science tools

2013

2nd global study

2012

OHI published;

1st global study

Halpern et al. Nature

Today

transparent and repeatable workflow,

7th global study underway,

20+ independent studies underway

43 of 82

1. Ocean Health Index embraces data science principles

Data preparation:

wrangling, formatting and other

tasks that can take 50–80% of a

data scientist’s time

44 of 82

1. Ocean Health Index embraces data science principles

Tidy data:

Data preparation:

wrangling, formatting and other

tasks that can take 50–80% of a

data scientist’s time

45 of 82

2. Ocean Health Index uses existing open data science tools

Figure adapted from Teucher 2018

46 of 82

2. Ocean Health Index uses existing open data science tools

47 of 82

2. Ocean Health Index uses existing open data science tools

48 of 82

3. OHI engages with collaborators and community (redefined)

49 of 82

3. OHI engages with collaborators and community (redefined)

OHI independent studies:

Planning

Analysis

Complete

50 of 82

1, 2, 3: All together now

Ocean Health Index analyses are repeated each year and are faster, easier, cheaper because we:

  1. embrace data science principles
  2. use existing collaborative open data science tools
  3. engage collaborators and community (redefined)

Why this matters for science

51 of 82

1, 2, 3: All together now

Ocean Health Index analyses are repeated each year and are faster, easier, cheaper because we:

  • embrace data science principles
  • use existing collaborative open data science tools
  • engage collaborators and community (redefined)

Why this matters for science

52 of 82

How did I get into open data science? (this stuff intimidated me)

53 of 82

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

54 of 82

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

Doing.

  • Trying things, doing tutorials, Googling errors
  • Websites/blogs/tutorials/slides as extended memory (jules32.github.io, ohi-science.org)

55 of 82

How did I get into open data science? (this stuff intimidated me)

A welcoming community.

I joined Twitter to learn #rstats

  • @rOpenSci
  • @RStudio #tidyverse
  • @RLadies
  • Other awesome data scientists

Doing.

  • Trying things, doing tutorials, Googling errors
  • Websites/blogs/tutorials/slides as extended memory (jules32.github.io, ohi-science.org)

Teaching/skill-sharing. (creating communities)

  • Seaside chats (lab)
  • Eco-Data-Science Study Group (Uni)
  • Software/Data Carpentries (global)
  • Ocean Health Index (global)
  • RLadies (global) (upcoming!)

56 of 82

Open data science practices made me a better scientist

Broader thinking: I think differently about what scientific questions I can ask.

Empathy: I remember what it’s like to learn something new.

Confidence: I’ve gained confidence in myself in a transferable way.

57 of 82

Open data science practices made me a better scientist

Broader thinking: I think differently about what scientific questions I can ask.

Empathy: I remember what it’s like to learn something new.

Confidence: I’ve gained confidence in myself in a transferable way.

-----

I feel so strongly about collaborative open data science that I’ve been moving away from my own research and towards enabling other scientists

58 of 82

So what can you do?

Promote and enable the culture of open data science

– even if you don’t code

59 of 82

So what can you do?

Promote and enable the culture of open data science

– even if you don’t code

Create/join community, locally and online

60 of 82

So what can you do?

Promote and enable the culture of open data science

– even if you don’t code

Create/join community, locally and online

Use existing online resources to learn and skillshare

ohi-science.org/betterscienceinlesstime

61 of 82

The power of collaborative open data science

To empower you and improve your science:

1. Data science is a discipline

2. Open data science tools exist

3. Learn with collaborators and community (redefined)

THANK YOU!

Julia Stewart Lowndes, PhD

lowndes@nceas.ucsb.edu

juliesquid jules32

jules32.github.io

February 27, 2018

SAFRED Keynote

Royal Belgian Institute of Natural Sciences

Brussels, Belgium

62 of 82

63 of 82

Extra slides

(to add for a longer talk)

64 of 82

2. Ocean Health Index uses existing open data science tools

Shared coding practices; convenient interface for coding and syncing

R code (scripts and console)

File navigation, help,

plots, packages

GitHub connection, environment, build

65 of 82

2. Ocean Health Index uses existing open data science tools

See what changed line-by-line

...and plot by plot

Sharing code and bookkeeping; convenient interface for collaborating

66 of 82

2. Ocean Health Index uses existing open data science tools

Working openly GitHub.com

Website

OHI-Science.org

Hands-on Training Books

67 of 82

1. Ocean Health Index embraces data science principles

RStudio projects as GitHub repos

RMarkdown combines code & text...

68 of 82

69 of 82

2. Open data science tools exist

Tools to make it efficient to collaborate – example workflow

1. Data in Google Sheets

2. Data import with googlesheets package

3. Coded analysis, with shared practices, version control

4. Sync online (versioned code)

5. Available online for reference or use

70 of 82

2. Open data science tools exist

Tools to make it efficient to collaborate – example workflow

1. Data is in a PDF

2. Data import with tabulizer package

3. Coded analysis, with shared practices, version control

4. Sync online (versioned code)

5. Available online for reference or use

71 of 82

1. Ocean Health Index embraces data science principles

RMarkdown displayed as webpages or websites

72 of 82

Working openly online - for science and communication: github.com/ohi-science

Global assessment!

Manuscript website!

Training e-book!

OHI+ assessment!

Science website!

Collaborators!

73 of 82

Analyses – R code and text together (R Markdown)

74 of 82

Science website: ohi-science.org

75 of 82

Interactive websites for published articles:

http://ohi-science.nceas.ucsb.edu/plos_change_in_global_ocean_health

76 of 82

1. Ocean Health Index embraces data science principles

www.rstudio.com/resources/cheatsheets

77 of 82

1. Ocean Health Index embraces data science principles

Naming files (Bryan 2015)

machine readable,

human readable,

play well with default ordering

78 of 82

Open science is the biggest opportunity for scientific advancement

79 of 82

80 of 82

Why we do science: To make a difference

81 of 82

Data skills are an unmet need

survey of 704 US National Science Foundation principal investigators in the biological sciences, which found training in data skills to be the largest unmet need

Barone et al. 2017

Not just for “big data”: a lot of this is good practices for bookkeeping, organization, and remembering what you did. Important even if you only have one collaborator at the moment – FUTURE YOU

82 of 82

1. Ocean Health Index embraces data science principles

Communication:

Websites, books, and more

File naming:

machine readable,

human readable,

play well with default ordering