1 of 32

BIAFLOWS:�Benchmarking & Deploying�Reproducible BioImage Analysis Workflows on the web

Defragmentation Training School

10^th May 2023

�

Volker Baecker (MRI, Biocampus Montpellier)

Benjamin Pavie (VIB BioImaging Core)

Sébastien Tosi (Danish BioImaging-INFRA IACF)

A project developed within NEUBIAS

COST Action CA 15124

https://neubias.org

2 of 32

BIAFLOWS

1. Reproducibility (Software) Volker

2. BIAFLOWS architecture Volker

3. BIAFLOWS interface, content and demo Sébastien

4. Adding new content Benjamin

(Problems, Images and Annotations)

5. Adding a new workflow to a Problem Volker

6. Future developments Sébastien

3 of 32

What does reproducibility mean?

“Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator.” [1][2]

[1] Goodman, S.N., Fanelli, D., and Ioannidis, J.P.A. (2016). What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12-341ps12.

[2] K. Bollen, J. T. Cacioppo, R. Kaplan, J. Krosnick, J. L. Olds, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science (National Science Foundation, Arlington, VA, 2015).

4 of 32

Vocabulary

Repeat: �the same lab runs the same experiment with the same set up
Replicate: �an independent lab runs the same experiment with the same set up
Reproduce: �an independent lab varies the experiment or set up
Reuse: �an independent lab runs a different experiment

�Full Reproducibility �is in general impossible �because of software rot !

What is Reproducibility? The R* Brouhaha, Professor Carole Goble, The University of Manchester, UK Software Sustainability Institute UK, AlanTuring Institute Symposium Reproducibility, Sustainability and Preservation , 6-7 April 2016, Oxford, UK

5 of 32

Bio-image analysis �in a scientific project

Hypothesis about biological objects�
Test by imaging

sample preparation / experiment
image acquisition
bio image analysis (BIA)

extracts information about objects in the images

data analysis

statistical analysis, data mining�

Conclusion

6 of 32

Why should it be reproducible?

Science should be reproducible�

A result is only accepted after having been reproduced by different independent groups�

Makes it possible to find errors in the analysis�

avoids wasting time by building on erroneous conclusions�

If the BIA is reproducible it is potentially reusable�

7 of 32

Who wants to reproduce/reuse?

A reviewer of a publication wants to review it�
A Bioimage analyst / biologist wants to use it on own images�
A Developer wants to build a tool out of it�
The author (or group) wants to use it again later

8 of 32

What are the problems?

The algorithm is not available

"We used ImageJ for image analysis.“�

The software (script, macros, plugin, ...) is not available

the algorithm is described, but has to be reimplemented

is it described with enough details?
how much time does it take to reimplement?
How close will the reimplementation be to the original? �

The software is available but there is no documentation

how to install, configure the software and run it?

9 of 32

10 of 32

What is the solution?

Make everything available

the analysis workflow (e.g. script, macro)
the environment

OS
libraries
software platform (FIJI, Icy)
plugins, ... with their *versions*

documentation (how and *why*)
the data and the parameters used in the analysis�

In a public repository (on the web)

Keep and archive a local copy

compare: Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285

11 of 32

Is the solution realistic?

The solution sounds pretty demanding, but

when writing and testing the analysis workflow almost everything is already available, the documentation often being the exception

there are recent tools and technologies that can help make this real !

12 of 32

Tools

Git and GitHub

source code and (to some extent) document management

and version control�

Docker and Singularity (“containerization”)

OS-level virtualization�

Jupyter

interactive notebooks�

A public repository for the images

IDR (Omero), Zenodo (OpenAIRE)

A public repository for everything

BIAFLOWS (NEUBIAS)

13 of 32

Git and GitHub

a distributed source code management and VCS
good for source code or any text files
keeps track of changes and allows to create versioned releases�

GitHub

a popular public host for Git-repositories
a web-application to browse and view the content
de-facto standard for open source software projects
GitHub actions (e.g. build Docker on new release)
many tools build around it

14 of 32

Docker and DockerHub

Docker

deliver software in containers
isolated software environments (OS, libraries, config files, …)
share one linux-kernel (lightweight compared to VMs)
incremental build (common part not duplicated)
created from a Dockerfile�

DockerHub

a public host for docker-images
images can be served and downloaded to run anywhere
upload to DockerHub can be automated from GitHub

15 of 32

�for reproducibility

the analysis workflow (scripts, macros)

Macro ImageJ

Executable workflow on BIAFLOWS

A workflow on GitHub

Workflow versioning

16 of 32

�for reproducibility

b. the environment: the OS, libraries, the software platform (FIJI, Icy), plugins, libraries ... with their *versions*

A workflow on GitHub

Dockerfile defining environment

Environment as public docker image on DockerHub

17 of 32

�for reproducibility

c. documentation (how and *why*)

Workflow descriptor

A workflow on GitHub

Access code and environment of a workflow from BIAFLOWS directly

BIAFLOWS is documented: https://biaflows-doc.neubias.org

18 of 32

�for reproducibility

d. the data and the parameters used in the analysis

Datasets/problems...

Parameters used in the analysis

… with ground truth

19 of 32

BIAFLOWS Magic:

The Wrapper Script

20 of 32

architecture

21 of 32

Yes, we are inclusive!

A Workflow should fulfill some soft requirements for its integration

Input: Process a folder of OME-TIFF images� Output: Results in predefined format (e.g. TIFF binary masks)� Call: Parse parameters and input/output folders from command line

A Workflow is fully defined by 4 files stored in GitHub

Set OS / software execution environment

Sequence

BIAFLOWS <--> workflow

interactions

IJ macro, Python script, compiled source code…

Define Input parameters

+ description

Workflow specific

Mostly Common to BIA platform/ecosystem

22 of 32

Let’s get started with BIAFLOWS!

Sandbox: https://biaflows.neubias.org

Get help: https://image.sc

23 of 32

BIAFLOWS �Problem Classes

Object Counting

Object Detection

Landmark Detection

Object Segmentation

Pixel Classification

Filament Tree tracing

Filament Network Tracing

Particle Tracking

Object Tracking

Problem Class

Sample Problem

Annotations

Vesicles 2D/3D

Drosophila wing vertices

Nuclei 2D/3D

Tumor, gland

Neurons

Blood vessels

Non dividing nuclei

Dividing nuclei

Binary masks (dots)

Label masks (dots)

Label masks

SWC

Skeleton binary masks

Label masks

Label masks +

division text file

24 of 32

BIAFLOWS: �A webtool supercharging benchmarking

Store annotated datasets illustrating key BIA problems

Standard data formats and metrics for 9 BIA problem classes

(e.g. object segmentation)

Versioned workflows + full execution environment

Default parameters optimized for the datasets

Visualize workflow results + compute benchmark metrics

Explore benchmark metrics (statistics or per image)

25 of 32

Adding New Content to BIAFLOWS�(Problems, Images and Annotations)

https://neubias-wg5.github.io/installing_populating_biaflows_locally.html#newproblem-section

https://neubias-wg5.github.io/installing_populating_biaflows_locally.html#uploading_images

https://neubias-wg5.github.io/installing_populating_biaflows_locally.html#uploading_gt-section

26 of 32

Adding new Workflows� to BIAFLOWS

https://neubias-wg5.github.io/creating_bia_workflow_and_adding_to_biaflows_instance.html

27 of 32

Computer Vision

ImageNET (Stanford)

> 14 M images (2023)

> 1 M images with bounding boxes

1.000 classes (cat, dog, …)

Common Objects in Context (COCO, Microsoft)

> 328 K images

2.5 M labeled object instances

91 object classes

Articles dedicated to workflows and comparison to other methods
Virtually all published workflows benchmarked on some reference datasets
Developers benchmark workflows independently (re-implement other methods)
Workflows can be difficult to reuse / adapt (compilation, parameters...)

28 of 32

Image Analysis Challenges

171 challenges

(May 2023)

2.170 CV datasets (May 2023)

7 with keyword microscopy

30 modality histology

5 modality Transmission EM

https://www.kaggle.com /

https://grand-challenge.org/

29 of 32

Grand-Challenge�Some Limitations

Low representation of biological microscopy datasets (virtually none since 2023)
Same problem -> possibly different annotation formats & metrics
Algorithms require authorized access + not all containerized

No support to remote display images and annotations from the web UI
No support to display metrics per image or dynamic compute statistics
No cloud managed sandbox

Grand-Challenge was designed as a Challenge platform,

not as an open image repository and IA workflows open distribution platform

30 of 32

BIAFLOWS / Cytomine-Community

Roadmap

Still a prototype:

Small user community, low number of datasets (mostly synthetic)

Core Features

Extended support for parsable metadata, dynamic projects
Isolate workflow execution from data handling and metrics computation
Interoperability with other platforms such as Grand-Challenge
Simplify workflow integration (wizard from UI)
External image storage (in S3)

Explorative

Generic workflow container for Object Seg. /Detect. with BioImage Zoo models
Extended parameter and I/O formats to chain workflows
Automated tiling and results merging for large images

31 of 32

Acknowledgements

CORE DEVELOPERS

Ulysse Rubens (ULiège),

Romain Mormont (ULiège)

Volker Baecker (MRI, Biocampus Montpellier)

Lassi Paavolainen (FIMM, HiLIFE, UHelsinki)

Benjamin Pavie (VIB), Leandro Aluisio Scholz (UFPR)

Raphaël Marée (ULiège)

Sébastien Tosi (IRB Barcelona)

CONTRIBUTORS

Ba Thien (Uliège), Renaud Hoyoux (Cytomine SCRL FS), Devrim Ünay (IUE), Gino Michiels (HEPL/ULiège), Anatole Chessel (Ecole Polytechnique), Martin Maska (Masaryk University), Rémy Vandaele (ULiège), Stefan Stanciu (Politehnica Bucarest), Ofra Golani (Weizmann institute of science), Graeme Ball (University of Dundee),Natasa Sladoje (Uppsala University), Perrine Paul-Gilloteaux (CNRS SFR Bonamy Nantes)

1 of 32

2 of 32

3 of 32

4 of 32

5 of 32

6 of 32

7 of 32

8 of 32

9 of 32

10 of 32

11 of 32

12 of 32

13 of 32

14 of 32

15 of 32

16 of 32

17 of 32

18 of 32

19 of 32

20 of 32

21 of 32

22 of 32

23 of 32

24 of 32

25 of 32

26 of 32

27 of 32

28 of 32

29 of 32

30 of 32

31 of 32

32 of 32