3 of 36

Free and open-source; Windows, Mac, Linux

Cited in 1,500+ papers per year

Used in 7/10 top pharma companies

In the Top 10 most popular papers in Genome Biology

Ranked most flexible and usable in independent analysis (Wiesmann et al.)

Anne Carpenter

Ray Jones

Lee Kamentsky

Allen Goodman

Claire McQuin

Beth Cimini

David Stirling

Alice Lucas

Nodar Gogoberidze

4 of 36

Software overview

Image analysis &

quantification

Image-centric

data analysis &

machine learning

5 of 36

Software overview

Measure everything

Ask question later

6 of 36

The CellProfiler interface

Pipeline panel

Settings panel

Module help

Start test mode

Set output folder

Start analysis run

Pipelines can be saved out as:

.cppipe or .json – text based, human readable (cppipe) and/or machine readable (json), best for sharing, using a pipeline in a new location
.cpproj – container file, contains the pipeline you made and the images that you loaded in. Best for resuming work on the machine you were already using.

7 of 36

The CellProfiler interface

8 of 36

The CellProfiler interface

The next module to run

- Will not execute

- Will execute

- Will pause during “Run”

- Won’t pause during “Run”

- Won’t show display

- Will show display

- Module set correctly

- Module has an error

- Module giving a warning

(such as “won’t run in test mode”)

Run until you hit a pause

Leave test mode

Run just the next module

Start over on the next image set

Launch the workspace viewer

Add, subtract, or reorder modules

9 of 36

The CellProfiler interface

Set what feeds into and out from every module

10 of 36

Module categories

File processing: Image input, file output

Image processing: Often used for pre-processing prior to object identification

Object processing: Identification, modification of objects of interest

Measurement: Collection of measurements from objects of interest

Data Tools: Measurement exploration, measurement output

Advanced: Typically modules for 3D analyses

Worm Toolbox: C. elegans-specific operations

Search modules for keywords

11 of 36

CellProfiler figure windows

The figure window has additional menu options

Toolbar menu: Home, pan, zoom in/out

CellProfiler Image Tools

Show pixel data (location, intensity)
Measure length between any two points just by clicking and dragging

12 of 36

Tips for creating a good high content analysis workflow

https://carpenter-singh-lab.broadinstitute.org/blog/when-to-say-good-enough

When finding the objects that you care about, ask yourself for your whole experiment:

Do I generally agree with most of the object segmentations from my analysis workflow?
Do I have an approximately equal number of regions/images where the threshold chosen by the algorithm for this image is a bit too low vs a bit too high?
Do I have an approximately equal number of oversegmentations/splits and undersegmentations/merges?
Very important: Do both the second and third bullet points hold true for both my negative control images and my positive control (or most extreme expected phenotype(s) sample) images?

13 of 36

“The thing I want to do doesn’t exist in CellProfiler!”

Are you sure it doesn’t?

Search the help, and/or post on image.sc

Is it an image processing utility that exists in ImageJ/Fiji?

Try out the RunImageJMacro module, which can point to your system ImageJ/Fiji

Go ahead and write your own!

We have templates to expand and a video on how to do it

14 of 36

Running on large image sets on CellProfiler

A few – a few hundred images

Can likely run on your local machine
CellProfiler will automatically multithread process up to your number of CPUs

A few hundred – a few tens of thousands of images

Talk to your local sysadmin about running on a cluster (directly or with Docker)
Check out our instructions on getting started

A few tens of thousands – a few million images

Can consider cloud processing
Check out our Distributed-CellProfiler package for running on AWS

15 of 36

Batch files

Easy way to transition from running locally to on the cluster
Data needs to have same structure on the cluster as on your local machine; path mapping needs to be right
Creates a .h5 file you can move to the cluster to run

16 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

how to group

which files

output folder

pipeline location

headless flags

executable call

Even if you’re running this ”wrapped” in a service somewhere, it’s important to know what information CellProfiler needs!

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

17 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

executable call

how to group

which files

output folder

pipeline location

headless flags

Executable call –

If installed in Python – cellprofiler or python –m cellprofiler or python3 –m cellprofiler
Windows executable - C:\Users\UserName\ProgramFiles\CellProfiler\CellProfiler.exe
Mac executable - /Applications/CellProfiler/Contents/MacOS/cp
Executables can be dragged and dropped to terminal

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

18 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

headless flags

executable call

how to group

which files

output folder

pipeline location

Headless flags – always the same, no need to adjust

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

19 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

pipeline location

headless flags

executable call

how to group

which files

output folder

Pipeline location –

Can be a .cppipe file or a batch file created with CreateBatchFiles - .cpproj generally does not work well

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

20 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

output folder

pipeline location

headless flags

executable call

how to group

which files

Output folder–

Where you want your output to go
In your CellProfiler pipeline, ensure all exporting modules (SaveImages, SaveCroppedObjects, ExportToSpreadsheet, ExportToDatabase) are using “Default Output Folder” (or a subfolder of it) as their export location

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

21 of 36

Getting data into CellProfiler - Input Modules

4 modules in total, handle all the “bookkeeping” of what your experimental setup is

Images (mandatory) – tell CellProfiler which images you want to analyze
Metadata (optional if one field of view per file) – give CellProfiler metadata from the file header OR file name
NamesAndTypes (mandatory) – tell CellProfiler if 2D vs 3D, how to break down channels, any other bookkeeping
Groups (mandatory for tracking, Z projection, or whole-plate correction pipelines, recommended for cluster processing, otherwise optional) – tell CellProfiler if it is important to keep any image sets together during processing

See a great blog post about this, with links to a video tutorial, at �broad.io/CellProfilerInput

22 of 36

Getting data into CellProfiler - LoadData

Create a CSV that instructs CellProfiler on how the images should be parsed – path and file name for each channel, any metadata you want included
You can add grouping and/or filtering to specific rows in the LoadData module settings
Handy if you’re comfortable scripting, and your data names are regularized!

23 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

which files

output folder

pipeline location

headless flags

executable call

how to group

Which files–

If you’re using Load Data

--data-file path/to/file.csv

If you’re using the Input modules and a .cppipe file:

Point to a folder on your cluster, run on all images there: --i path/to/folder
Pass in a text file listing images: --file-list path/to/file.txt

If you’re using the Input modules and a batch file:

Nothing needs to be entered here, it’s encoded in the batch file

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

24 of 36

In practice, how do I run CellProfiler headlessly?

cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}

how to group

which files

output folder

pipeline location

headless flags

executable call

How to group–

Some workflows (e.g. tracking, plate illumination correction) demand particular groupings (typically metadata-based)
Grouping otherwise allows parallelization – rather than a small number of CPUs running �thousands of files, thousands run small numbers of files
Group by metadata: -g Metadata_Well=A01
Group by image set count: -f 11 -l 20
If using a batch file, you can get it to print all the groups present: �--get-batch-commands-new

Add this to use -f/-l flags: --images-per-batch

https://carpenter-singh-lab.broadinstitute.org/blog/getting-started-using-cellprofiler-command-line

25 of 36

Ok, I get the principles, how do I ACTUALLY do this?

26 of 36

Your local cluster

Install CellProfiler (4+ recommended, 3 and below run Python 2 which is past end of life) on your cluster
Generate execution commands for the job in question (manually or using the flags demonstrated above)
Put into your cluster’s submission system

Pro’s:

Local
Likely free to you

Con’s:

Dependent on local bandwidth
Likely need IT support for setting up 1 and 3

Installation SHOULD be smooth, but…

Hard to support multiple CellProfiler versions

27 of 36

Containerization solves installation and version issues

Containerization: someone installs it once, you use their installation in a tiny OS “box” forever after

Reproducible!
Use it anywhere!*
You personally only have to install a program to run containers, and never anything else again!
Typically involves some code to run; many containers do not come with GUIs (or can be painful to use them)
Groups such as biocontainers have already made a LOT (>1000) of them
Developers tend to prefer Docker containers, sysadmins Singularity containers (but Singularity can run Docker containers)

https://biocontainers.pro/

28 of 36

Docker

Can be local (your own machine, your university cluster) or somewhere in the cloud
docker run \ �--volume=some/input/folder:/input \�--volume=some/output/folder:/output \�cellprofiler/cellprofiler:4.2.4 \�cellprofiler –c –r –p path/to/pipeline –o some/directory {INPUT} {GROUPINGS}�

First line tells Docker to run a container
Second line to mount where your images are located
Third where you want your output to be
Fourth line is the container (can use your own/other versions too)
Fifth line you should already understand!

https://github.com/CellProfiler/distribution/blob/master/docker/Dockerfile

29 of 36

Galaxy

An easy-to-use (for end users) way to put a GUI onto an analysis, as well as make it shareable and reproducible

Developers need to create an XML file that “wraps” the analysis and tells Galaxy what type of input to expect, what type of output to expect, etc

Can run interactive tools such as Jupyter, etc
Many instances running on many physical pieces of hardware all over the world
CellProfiler can be run very simply in the Galaxy Imaging node – v3.1.9 and v4.2.1 ONLY, and only single threaded (no grouping flags) for now. In 3.1.9 can build a pipeline from modules, 4.2.1 only run premade .cppipe files.
Pro’s:

Easy to use
Likely free to you
Easy to share analyses, make them reproducible, etc

Con’s:

Dependent on bandwidth of your Galaxy host
Creating a wrapper can be painful for new developers

https://imaging.usegalaxy.eu/

https://training.galaxyproject.org/training-material/topics/imaging/tutorials/object-tracking-using-cell-profiler/tutorial.html

30 of 36

Terra

Terra.bio – made by Broad Institute, Verily (Google), and Microsoft
Run analyses in Google Cloud, on data stored there OR in Azure OR in Amazon Web Services (AWS); can also be used to run Galaxy
Can run interactive tools such as Jupyter, or workflows by making a “wrapper” using WDL (Workflow Description Language)
Pro’s:

In the cloud, so bandwidth is never an issue
Lots of example workflows, especially in genomics

Con’s:

Not free – though can get $300 in credits
Current implementations may not support all grouping strategies, only support .cppipe
Another workflow language to learn!

https://imaging.usegalaxy.eu/

31 of 36

Distributed-CellProfiler

Run CellProfiler in the cloud on AWS
No need to know how to code, just edit a configuration file and execute pre-made scripts
Pro’s:

In the cloud, so bandwidth is never an issue
Just need to fill out a pre-made JSON file, no coding required
Extends out to non-CellProfiler projects with the rest of the DistributedScience universe
Made by the CellProfiler team, so good integration – supports batch files, grouping, etc

Con’s:

Not free
Command line-only

https://github.com/DistributedScience

32 of 36

How can I learn how to do this stuff?��Where can I go for help?

33 of 36

forum.image.sc - Open scientific community forum for bioimage analysis and beyond�

And finally, a bit about the broader bioimaging community. Image analysis forums play a major role in assisting biologists in using software for image analysis. Currently, each open-source software package in bioimaging has its own distinct forum or email list. But recently, the groups leading the CellProfiler and ImageJ projects teamed up to merge their forums towards creating a collaborative network of support for the scientific imaging community.
The merge is currently in progress, and we expect the site to be up in the next couple of months. Until then, I invite you to go ahead and look up these forums if you haven't already to get a sense of the support networks that already exist. You will find that image analysis can be complicated, and that it sometimes takes a village to solve a problem, but the community is there to help. So as you launch all those exciting projects in this new center – tap in to this resource – and welcome to the village!

34 of 36

Center for Open Bioimage Analysis

Openbioimageanalysis.org

And finally, a bit about the broader bioimaging community. Image analysis forums play a major role in assisting biologists in using software for image analysis. Currently, each open-source software package in bioimaging has its own distinct forum or email list. But recently, the groups leading the CellProfiler and ImageJ projects teamed up to merge their forums towards creating a collaborative network of support for the scientific imaging community.
The merge is currently in progress, and we expect the site to be up in the next couple of months. Until then, I invite you to go ahead and look up these forums if you haven't already to get a sense of the support networks that already exist. You will find that image analysis can be complicated, and that it sometimes takes a village to solve a problem, but the community is there to help. So as you launch all those exciting projects in this new center – tap in to this resource – and welcome to the village!

35 of 36

Gratitude

Recent major funding for this work provided by:

CZI Imaging Scientist Fellowship
NIH NIGMS: MIRA R35 GM122547
CZI Software Fellows program
NIH NIGMS: P41 GM135019

Many thanks to our

many biology collaborators

Beth Cimini

Mario Cruz

Barbara Diaz-Rohrer

Fernanda Fossa

Melissa Gillis

Nodar Gogoberidze

Serena Larew

Andréa Papaleo

Marine Secchi

Rebecca Senft

Callum Tromans-Coia

Erin Weisbart

Anne Carpenter�Shantanu Singh

John Arevalo�Niranj Chandrasekaran

Marzieh Haghighi

Yu Han

Alexander Kalinin

Serena Larew

Becki Ledford

Robert van Dijk

Cimini Lab members

IMAGING

PLATFORM

Carpenter-Singh Lab members

36 of 36

Hands-on

Activity – run CellProfiler headless on your own machine

Can use the “Beginner Segmentation” images and ”final” pipeline from tutorials.cellprofiler.org
Get it to run headlessly with first and last flags, as well as grouping flags- what must you do in CellProfiler to get those to work?
Optionally, install Docker on your machine, and try to do the same thing with the CellProfiler Docker

Reminder, you can get these slides at broad.io/neubias23