1 of 26

New advances in the development of echinopscis, an extensible notebook for open science on specimens

Nicky Nicolson, Eve J Lucas

Royal Botanic Gardens, Kew

Symposium: “Taxonomy as open science: tool support to facilitate data use for hands-on practitioners”

XX International Botanical Congress

26th July 2024

2 of 26

  • Transitioned from software development into research
  • How we can use software development practices in research:
    • Reuse
    • Automation
    • Version control
    • Dependency management
    • Continuous integration
  • Also processes about communication, design & inclusion
  • Open science, take-up & how we design & build for participation

Context: personal & institutional

Image: jesse orrico / unsplash

3 of 26

Proposed early 2000s

Systematics Association special issue 2008

Since then:

  • We do everything online

  • Explosion of data availability

  • Recognised different roles in research

e-taxonomy – moving taxonomic activities online

Image: RBG Kew

4 of 26

Trends in data availability

5 of 26

  • Online & remote collaboration
  • Awareness of different roles in research
    • research software engineer role
    • data steward
  • Skills development (data & software carpentry)
  • Recognition of different activities required for successful research
  • Open science: more than open data

Wider context: evolving research culture

6 of 26

  • Prototyping tools to enable taxonomic research using specimens
    • https://echinopscis.github.io
  • An open science experiment, participatory design:
    • Local working environment
    • Open to choose your workflow
    • Access to:
      • Specimens
      • Names
      • Taxonomy
      • Literature
      • Institutional profiles,
      • People
  • Built through the Open Life Sciences (OLS) mentoring / ambassador program https://openlifesci.org

e-taxonomy: what tools & practices do we need?

7 of 26

Where to put our efforts?

Browser

8 of 26

Where to put our efforts?

Browser

API

9 of 26

Where to put our efforts?

Browser

API

Something else?

10 of 26

  • A personal knowledge manager: for creating & linking research notes
  • Emphasises linking
  • Data stored locally, using open formats
    • Markdown and optional structured data frontmatter
  • Works offline
  • Extensible architecture – plugins for data access and citation processing
  • Active user and developer community

…sounds a lot like OpenRefine, which we have adopted with some success

Could this contribute to our management of linked, semi-structured data, as Open Refine has for tabular data?

11 of 26

12 of 26

Open by default:

    • “Find our project on github”

Open by design:

    • Our aim is …
    • Our roadmap is …
    • You can contribute by …
    • Submit issues here …
    • Expect this kind of behaviour …
    • Our decision-making process is …

= “Find our project on github” with all these pointers

Open science: what kind of “open”?

Image: Dima Pechurin / unsplash

13 of 26

  • Wikidata plugin:
    • Consult remote information resource in-context

echinopscis demos (1/5)

14 of 26

15 of 26

  • Wikidata plugin:
    • Consult remote information resource in-context
  • Glossary plugin:
    • Highlight a term, get a definition and illustration

echinopscis demos (2/5)

16 of 26

Glossary

17 of 26

  • Wikidata plugin:
    • Consult remote information resource in-context
  • Glossary plugin:
    • Highlight a term, get a definition and illustration
  • Phylo plugin:
    • Define a Newick code block in a document
    • Use leaves and internal nodes to search notes

echinopscis demos (3/5)

18 of 26

Phylo

19 of 26

  • Wikidata plugin:
    • Consult remote information resource in-context
  • Glossary plugin:
    • Highlight a term, get a definition and illustration
  • Phylo plugin:
    • Define a Newick code block in a document
    • Use leaves and internal nodes to search notes
  • “Co-pilot” plugin
    • Reformat your data with a large language model assistant
    • Text paragraph of specimens to Darwin core format

echinopscis demos (4/5)

20 of 26

You are a diligent scientific research assistant. You transcribe information accurately without including new data for which there is no basis.

If you receive the instruction "specimen table" you will try to make a markdown table from my input with the following headers: country, region, locality, latitude, longitude, date, collector, record number, herbarium codes(s).

If you receive the instruction "dwc specimen table" you will try to use Darwin Core terms for the table headers where appropriate. Darwin Core is a data standard which defines terms used in biodiversity informatics data sharing.

Co-pilot custom prompt – specimen tables

21 of 26

Co-pilot

22 of 26

  • Wikidata plugin:
    • Consult remote information resource in-context
  • Glossary plugin:
    • Highlight a term, get a definition and illustration
  • Phylo plugin:
    • Define a Newick code block in a document
    • Use leaves and internal nodes to search notes
  • “Co-pilot” plugin
    • Reformat your data with a large language model assistant
    • Text paragraph of specimens to Darwin core format
    • Text species descriptions to trait matrices

echinopscis demos (4/5)

23 of 26

You are a diligent scientific research assistant. You transcribe information accurately without including new data for which there is no basis.

When you receive the instruction "trait table" you will reply "here is your trait table" and provide a table of the descriptive traits included in the species description. For example if you receive a species description with the sentence "stems 3-5mm is diameter, green, pubescent" you will output a table with column headers "character" and "value" and the rows

stem_diameter,3-5mm

stem_color,green,

stem_description,pubsecent

Co-pilot custom prompt – trait extraction

24 of 26

25 of 26

Interested? Checklist of some practical actions

  • Review these slides & the demos: https://bit.ly/IBC2024-nicolson
    • Look at the echinopscis website & try it out: https://echinopscis.github.io
  • Sign up to github: https://github.com/signup

Bonus: Link your github profile to your ORCID

    • Participate on the discussion board: https://github.com/orgs/echinopscis/discussions
    • Think: if we made a pre-populated echinopscis vault for your group, what data should we include? What kind of collaboration features do you need?

Nicolson & Lucas: New advances in the development of echinopscis, an extensible notebook for open science on specimens.

Slides: https://bit.ly/IBC2024-nicolson

26 of 26

Illustrations created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807