BIAFLOWS:�Benchmarking & Deploying�Reproducible BioImage Analysis Workflows on the web
Defragmentation Training School
10th May 2023
�
Volker Baecker (MRI, Biocampus Montpellier)
Benjamin Pavie (VIB BioImaging Core)
Sébastien Tosi (Danish BioImaging-INFRA IACF)
A project developed within NEUBIAS
COST Action CA 15124
BIAFLOWS
1. Reproducibility (Software) Volker
2. BIAFLOWS architecture Volker
3. BIAFLOWS interface, content and demo Sébastien
4. Adding new content Benjamin
(Problems, Images and Annotations)
5. Adding a new workflow to a Problem Volker
6. Future developments Sébastien
What does reproducibility mean?
“Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator.” [1][2]
[1] Goodman, S.N., Fanelli, D., and Ioannidis, J.P.A. (2016). What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12-341ps12.
[2] K. Bollen, J. T. Cacioppo, R. Kaplan, J. Krosnick, J. L. Olds, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science (National Science Foundation, Arlington, VA, 2015).
Vocabulary
�Full Reproducibility �is in general impossible �because of software rot !
What is Reproducibility? The R* Brouhaha, Professor Carole Goble, The University of Manchester, UK Software Sustainability Institute UK, AlanTuring Institute Symposium Reproducibility, Sustainability and Preservation , 6-7 April 2016, Oxford, UK
Bio-image analysis �in a scientific project
Why should it be reproducible?
Who wants to reproduce/reuse?
What are the problems?
"We used ImageJ for image analysis.“�
the algorithm is described, but has to be reimplemented
how to install, configure the software and run it?
More problems...
if it does not work out of the box on my data, is it because it's broken or because my data differs?�
…�
What is the solution?
compare: Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285
Is the solution realistic?
Tools
source code and (to some extent) document management
and version control�
OS-level virtualization�
interactive notebooks�
IDR (Omero), Zenodo (OpenAIRE)
BIAFLOWS (NEUBIAS)
Git and GitHub
Docker and DockerHub
�for reproducibility
Macro ImageJ
Executable workflow on BIAFLOWS
A workflow on GitHub
Workflow versioning
�for reproducibility
b. the environment: the OS, libraries, the software platform (FIJI, Icy), plugins, libraries ... with their *versions*
A workflow on GitHub
Dockerfile defining environment
Environment as public docker image on DockerHub
�for reproducibility
c. documentation (how and *why*)
Workflow descriptor
A workflow on GitHub
Access code and environment of a workflow from BIAFLOWS directly
BIAFLOWS is documented: https://biaflows-doc.neubias.org
�for reproducibility
d. the data and the parameters used in the analysis
Datasets/problems...
Parameters used in the analysis
… with ground truth
BIAFLOWS Magic:
The Wrapper Script
architecture
Yes, we are inclusive!
A Workflow should fulfill some soft requirements for its integration
Input: Process a folder of OME-TIFF images� Output: Results in predefined format (e.g. TIFF binary masks)� Call: Parse parameters and input/output folders from command line
A Workflow is fully defined by 4 files stored in GitHub
Set OS / software execution environment
Sequence
BIAFLOWS <--> workflow
interactions
IJ macro, Python script, compiled source code…
Define Input parameters
+ description
Workflow specific
Mostly Common to BIA platform/ecosystem
Let’s get started with BIAFLOWS!
BIAFLOWS �Problem Classes
Object Counting
Object Detection
Landmark Detection
Object Segmentation
Pixel Classification
Filament Tree tracing
Filament Network Tracing
Particle Tracking
Object Tracking
Problem Class
Sample Problem
Annotations
Vesicles 2D/3D
Vesicles 2D/3D
Drosophila wing vertices
Nuclei 2D/3D
Tumor, gland
Neurons
Blood vessels
Non dividing nuclei
Dividing nuclei
Binary masks (dots)
Binary masks (dots)
Label masks (dots)
Label masks
Label masks
SWC
Skeleton binary masks
Label masks
Label masks +
division text file
BIAFLOWS: �A webtool supercharging benchmarking
Store annotated datasets illustrating key BIA problems
Standard data formats and metrics for 9 BIA problem classes
(e.g. object segmentation)
Versioned workflows + full execution environment
Default parameters optimized for the datasets
Visualize workflow results + compute benchmark metrics
Explore benchmark metrics (statistics or per image)
Adding New Content to BIAFLOWS�(Problems, Images and Annotations)
Adding new Workflows� to BIAFLOWS
Computer Vision
ImageNET (Stanford)
> 14 M images (2023)
> 1 M images with bounding boxes
1.000 classes (cat, dog, …)
Common Objects in Context (COCO, Microsoft)
> 328 K images
2.5 M labeled object instances
91 object classes
Image Analysis Challenges
171 challenges
(May 2023)
2.170 CV datasets (May 2023)
7 with keyword microscopy
30 modality histology
5 modality Transmission EM
Grand-Challenge�Some Limitations
Grand-Challenge was designed as a Challenge platform,
not as an open image repository and IA workflows open distribution platform
BIAFLOWS / Cytomine-Community
Roadmap
Still a prototype:
Small user community, low number of datasets (mostly synthetic)
Core Features
Explorative
Acknowledgements
CORE DEVELOPERS
Ulysse Rubens (ULiège),
Romain Mormont (ULiège)
Volker Baecker (MRI, Biocampus Montpellier)
Lassi Paavolainen (FIMM, HiLIFE, UHelsinki)
Benjamin Pavie (VIB), Leandro Aluisio Scholz (UFPR)
Raphaël Marée (ULiège)
Sébastien Tosi (IRB Barcelona)
CONTRIBUTORS
Ba Thien (Uliège), Renaud Hoyoux (Cytomine SCRL FS), Devrim Ünay (IUE), Gino Michiels (HEPL/ULiège), Anatole Chessel (Ecole Polytechnique), Martin Maska (Masaryk University), Rémy Vandaele (ULiège), Stefan Stanciu (Politehnica Bucarest), Ofra Golani (Weizmann institute of science), Graeme Ball (University of Dundee),Natasa Sladoje (Uppsala University), Perrine Paul-Gilloteaux (CNRS SFR Bonamy Nantes)
Acknowledgements