1 of 32

BIAFLOWS:�Benchmarking & Deploying�Reproducible BioImage Analysis Workflows on the web

Defragmentation Training School

10th May 2023

Volker Baecker (MRI, Biocampus Montpellier)

Benjamin Pavie (VIB BioImaging Core)

Sébastien Tosi (Danish BioImaging-INFRA IACF)

A project developed within NEUBIAS

COST Action CA 15124

https://neubias.org

2 of 32

BIAFLOWS

1. Reproducibility (Software) Volker

2. BIAFLOWS architecture Volker

3. BIAFLOWS interface, content and demo Sébastien

4. Adding new content Benjamin

(Problems, Images and Annotations)

5. Adding a new workflow to a Problem Volker

6. Future developments Sébastien

3 of 32

What does reproducibility mean?

“Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator.” [1][2]

[1] Goodman, S.N., Fanelli, D., and Ioannidis, J.P.A. (2016). What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12-341ps12.

[2] K. Bollen, J. T. Cacioppo, R. Kaplan, J. Krosnick, J. L. Olds, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science (National Science Foundation, Arlington, VA, 2015).

4 of 32

Vocabulary

  • Repeat: �the same lab runs the same experiment with the same set up
  • Replicate: �an independent lab runs the same experiment with the same set up
  • Reproduce: �an independent lab varies the experiment or set up
  • Reuse: �an independent lab runs a different experiment

�Full Reproducibility �is in general impossible �because of software rot !

What is Reproducibility? The R* Brouhaha, Professor Carole Goble, The University of Manchester, UK Software Sustainability Institute UK, AlanTuring Institute Symposium Reproducibility, Sustainability and Preservation , 6-7 April 2016, Oxford, UK

5 of 32

Bio-image analysis �in a scientific project

  • Hypothesis about biological objects�
  • Test by imaging
    • sample preparation / experiment
    • image acquisition
    • bio image analysis (BIA)
      • extracts information about objects in the images
    • data analysis
      • statistical analysis, data mining�
  • Conclusion

6 of 32

Why should it be reproducible?

  • Science should be reproducible�
    • A result is only accepted after having been reproduced by different independent groups�
  • Makes it possible to find errors in the analysis�
    • avoids wasting time by building on erroneous conclusions�
  • If the BIA is reproducible it is potentially reusable�

7 of 32

Who wants to reproduce/reuse?

  • A reviewer of a publication wants to review it�
  • A Bioimage analyst / biologist wants to use it on own images�
  • A Developer wants to build a tool out of it�
  • The author (or group) wants to use it again later

8 of 32

What are the problems?

  1. The algorithm is not available

"We used ImageJ for image analysis.“�

  • The software (script, macros, plugin, ...) is not available

the algorithm is described, but has to be reimplemented

      • is it described with enough details?
      • how much time does it take to reimplement?
      • How close will the reimplementation be to the original? �
  1. The software is available but there is no documentation

how to install, configure the software and run it?

9 of 32

More problems...

  1. The original data is not available and/or the parameters used are not communicated

if it does not work out of the box on my data, is it because it's broken or because my data differs?�

  • Everything is available, but the software does not work in my environment
      • differences in the OS
      • differences in library versions
      • incompatible hardware

…�

  • The software is outdated and cannot be run in modern OS

10 of 32

What is the solution?

  1. Make everything available
    1. the analysis workflow (e.g. script, macro)
    2. the environment
      1. OS
      2. libraries
      3. software platform (FIJI, Icy)
      4. plugins, ... with their *versions*
    3. documentation (how and *why*)
    4. the data and the parameters used in the analysis�
  2. In a public repository (on the web)

  • Keep and archive a local copy

compare: Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285

11 of 32

Is the solution realistic?

  • The solution sounds pretty demanding, but

    • when writing and testing the analysis workflow almost everything is already available, the documentation often being the exception

    • there are recent tools and technologies that can help make this real !

12 of 32

Tools

  1. Git and GitHub

source code and (to some extent) document management

and version control�

  • Docker and Singularity (“containerization”)

OS-level virtualization�

  • Jupyter

interactive notebooks�

  • A public repository for the images

IDR (Omero), Zenodo (OpenAIRE)

  • A public repository for everything

BIAFLOWS (NEUBIAS)

13 of 32

Git and GitHub

  • Git
    • a distributed source code management and VCS
    • good for source code or any text files
    • keeps track of changes and allows to create versioned releases�
  • GitHub
    • a popular public host for Git-repositories
    • a web-application to browse and view the content
    • de-facto standard for open source software projects
    • GitHub actions (e.g. build Docker on new release)
    • many tools build around it

14 of 32

Docker and DockerHub

  • Docker
    • deliver software in containers
    • isolated software environments (OS, libraries, config files, …)
    • share one linux-kernel (lightweight compared to VMs)
    • incremental build (common part not duplicated)
    • created from a Dockerfile
  • DockerHub
    • a public host for docker-images
    • images can be served and downloaded to run anywhere
    • upload to DockerHub can be automated from GitHub

15 of 32

�for reproducibility

    • the analysis workflow (scripts, macros)

Macro ImageJ

Executable workflow on BIAFLOWS

A workflow on GitHub

Workflow versioning

16 of 32

�for reproducibility

b. the environment: the OS, libraries, the software platform (FIJI, Icy), plugins, libraries ... with their *versions*

A workflow on GitHub

Dockerfile defining environment

Environment as public docker image on DockerHub

17 of 32

�for reproducibility

c. documentation (how and *why*)

Workflow descriptor

A workflow on GitHub

Access code and environment of a workflow from BIAFLOWS directly

BIAFLOWS is documented: https://biaflows-doc.neubias.org

18 of 32

�for reproducibility

d. the data and the parameters used in the analysis

Datasets/problems...

Parameters used in the analysis

… with ground truth

19 of 32

BIAFLOWS Magic:

The Wrapper Script

20 of 32

architecture

21 of 32

Yes, we are inclusive!

A Workflow should fulfill some soft requirements for its integration

Input: Process a folder of OME-TIFF images� Output: Results in predefined format (e.g. TIFF binary masks)� Call: Parse parameters and input/output folders from command line

A Workflow is fully defined by 4 files stored in GitHub

Set OS / software execution environment

Sequence

BIAFLOWS <--> workflow

interactions

IJ macro, Python script, compiled source code…

Define Input parameters

+ description

Workflow specific

Mostly Common to BIA platform/ecosystem

22 of 32

Let’s get started with BIAFLOWS!

Sandbox: https://biaflows.neubias.org

Get help: https://image.sc

23 of 32

BIAFLOWS �Problem Classes

Object Counting

Object Detection

Landmark Detection

Object Segmentation

Pixel Classification

Filament Tree tracing

Filament Network Tracing

Particle Tracking

Object Tracking

Problem Class

Sample Problem

Annotations

Vesicles 2D/3D

Vesicles 2D/3D

Drosophila wing vertices

Nuclei 2D/3D

Tumor, gland

Neurons

Blood vessels

Non dividing nuclei

Dividing nuclei

Binary masks (dots)

Binary masks (dots)

Label masks (dots)

Label masks

Label masks

SWC

Skeleton binary masks

Label masks

Label masks +

division text file

24 of 32

BIAFLOWS: �A webtool supercharging benchmarking

Store annotated datasets illustrating key BIA problems

Standard data formats and metrics for 9 BIA problem classes

(e.g. object segmentation)

Versioned workflows + full execution environment

Default parameters optimized for the datasets

Visualize workflow results + compute benchmark metrics

Explore benchmark metrics (statistics or per image)

25 of 32

Adding New Content to BIAFLOWS�(Problems, Images and Annotations)

26 of 32

Adding new Workflows� to BIAFLOWS

27 of 32

Computer Vision

ImageNET (Stanford)

> 14 M images (2023)

> 1 M images with bounding boxes

1.000 classes (cat, dog, …)

Common Objects in Context (COCO, Microsoft)

> 328 K images

2.5 M labeled object instances

91 object classes

  • Articles dedicated to workflows and comparison to other methods
  • Virtually all published workflows benchmarked on some reference datasets
  • Developers benchmark workflows independently (re-implement other methods)
  • Workflows can be difficult to reuse / adapt (compilation, parameters...)

28 of 32

Image Analysis Challenges

171 challenges

(May 2023)

2.170 CV datasets (May 2023)

7 with keyword microscopy

30 modality histology

5 modality Transmission EM

29 of 32

Grand-Challenge�Some Limitations

  • Low representation of biological microscopy datasets (virtually none since 2023)
  • Same problem -> possibly different annotation formats & metrics
  • Algorithms require authorized access + not all containerized

  • No support to remote display images and annotations from the web UI
  • No support to display metrics per image or dynamic compute statistics
  • No cloud managed sandbox

Grand-Challenge was designed as a Challenge platform,

not as an open image repository and IA workflows open distribution platform

30 of 32

BIAFLOWS / Cytomine-Community

Roadmap

Still a prototype:

Small user community, low number of datasets (mostly synthetic)

Core Features

  • Extended support for parsable metadata, dynamic projects
  • Isolate workflow execution from data handling and metrics computation
  • Interoperability with other platforms such as Grand-Challenge
  • Simplify workflow integration (wizard from UI)
  • External image storage (in S3)

Explorative

  • Generic workflow container for Object Seg. /Detect. with BioImage Zoo models
  • Extended parameter and I/O formats to chain workflows
  • Automated tiling and results merging for large images

31 of 32

Acknowledgements

CORE DEVELOPERS

Ulysse Rubens (ULiège),

Romain Mormont (ULiège)

Volker Baecker (MRI, Biocampus Montpellier)

Lassi Paavolainen (FIMM, HiLIFE, UHelsinki)

Benjamin Pavie (VIB), Leandro Aluisio Scholz (UFPR)

Raphaël Marée (ULiège)

Sébastien Tosi (IRB Barcelona)

CONTRIBUTORS

Ba Thien (Uliège), Renaud Hoyoux (Cytomine SCRL FS), Devrim Ünay (IUE), Gino Michiels (HEPL/ULiège), Anatole Chessel (Ecole Polytechnique), Martin Maska (Masaryk University), Rémy Vandaele (ULiège), Stefan Stanciu (Politehnica Bucarest), Ofra Golani (Weizmann institute of science), Graeme Ball (University of Dundee),Natasa Sladoje (Uppsala University), Perrine Paul-Gilloteaux (CNRS SFR Bonamy Nantes)

32 of 32

Acknowledgements