1 of 25

A hitchhiker's Guide to DeepProfiler

Jamboree day 11/19/21

Michael Bornholdt

IMAGING

PLATFORM

2 of 25

Todays Workshop

12pm: Intro
DeepProfiler basics
Profiling pipeline
Break
13 pm: Hands on time!
Break
Profiling scale up / Distributed DP
14.30 pm: Future outlook (Juan)
Break
Training models

IMAGING

PLATFORM

3 of 25

Mandatory intro to Profiling

IMAGING

PLATFORM

4 of 25

My last 6 months

Project in one line:

“I designed a protocol for robustly and scalably quantifying cellular states from images using deep neural networks; the JUMP consortium will use this protocol to profile >2B cells to generate a dataset that will be made public in 2022”

IMAGING

PLATFORM

5 of 25

Motivation

Why do we want to use DeepProfiler over CellProfiler

Performance (higher scores on evaluation metrics)
Costs (its cheaper)
Faster (GPU acceleration)
Autonomous (no need for human input to tweak parameters)
Less downstream processing (e.g., no normalization)
Potential for models that are robust against batch correction

Background of DeepProfiler

Developed over the last years
Recent improvements in the last few months
Currently at version 0.3.0
Documentation and making it available to a wider audience is the current focus
I have worked with DP for the last months on my thesis

IMAGING

PLATFORM

6 of 25

Introduction to DeepProfiler

Topics

Installing
Folder Structure
Images, Location files, Crops
Config file, Index file
Command line details

IMAGING

PLATFORM

7 of 25

Installing DP

Packages:

"beautifulsoup4>=4.6",

"click>=6.7",

"comet_ml>=1.0",

"efficientnet==1.1.1",

"gpyopt>=1.2",

"lxml>=4.2",

"numpy>=1.13",

"pandas>=0.23.0",

"scikit-image>=0.14.0",

"scikit-learn>=0.19.0",

"scipy>=1.1",

"comet-ml>=3.1.6",

"tensorflow==2.5.*",

"tensorflow_addons",

"tqdm>=4.62",

https://hub.docker.com/repository/docker/michaelbornholdt/deep_profiler

Requires: Tensorflow 2.5

Make sure the correct CUDA and cuDNN versions are installed and the paths are set

IMAGING

PLATFORM

8 of 25

Input / Output of DP

IMAGING

PLATFORM

9 of 25

Images and location files

Location files:

Hold cell centers
Come from CellProfiler

Images:

Compressed
X sites per well
N channels
Stored in a nested folder structure

IMAGING

PLATFORM

10 of 25

Index and Config

df.columns

Index(['Metadata_Plate', 'Metadata_Well', 'Metadata_Site', 'Metadata_broad_sample', 'Metadata_moa', 'Metadata_mmoles_per_liter', 'Metadata_dose_recode', 'RNA', 'ER', 'AGP', 'Mito', 'DNA', 'Concentration', 'Treatment_ID', 'Compound', 'pert_iname', 'Treatment_Replicate', 'Treatment', 'Plate_Map_Name', 'Split'], dtype='object')

"train": {

"partition": {

"targets": [

"Compound"

],

"split_field": "Split",

"training_values": [0],

"validation_values": [1] },

"model": {

"name": "efficientnet",

"augmentations": false,

"crop_generator": "sampled_crop_generator",

"metrics": ["accuracy", "top_k"],

"epochs": 20,

"initialization":"ImageNet",

"params": {

"learning_rate": 0.02,

"batch_size": 256,

"conv_blocks": 0,

"label_smoothing": 0.0,

"feature_dim": 256,

"pooling": "avg"

The most important file in DP!

This file contains information about the experiment, with the minimum columns necessary to do sampling and run learning algorithms.

The configuration file is a text file in JSON format that organizes various settings for one experiment.

IMAGING

PLATFORM

11 of 25

Index file

Must have:

Metadata_Plate
Metadata_Well
Metadata_Site
Plate_Map_Name
Channel_Name
Treatment
Replicate

How do we create and manage these index files?

IMAGING

PLATFORM

12 of 25

Command line

python3 deepprofiler --root=/home/ubuntu/project/ --config filename.json train

python3 deepprofiler --root=/home/ubuntu/project/ --config filename.json profile

Options:

--root PATH Root directory for DeepProfiler experiment

--config TEXT Path to existing config file

--cores INTEGER Number of CPU cores for parallel processing (all=0)

--gpu TEXT GPU device id

--exp TEXT Name of experiment

--single-cells TEXT Name of single cell export folder

--metadata TEXT Name of metadata file, default index.csv

--logging TEXT Path to file with comet.ml API key

--help Show this message and exit.

IMAGING

PLATFORM

13 of 25

The Profiling pipeline

The full profiling pipeline with DeepProfiler looks like the following:

Acquire images

Extract cell locations (with CellProfiler)

Run inference

Export single-cell crops to prepare for training

Compress images and perform illumination correction

Move data to DeepProfiler format

DP prepare

Aggregate onto well level

Train model on data

Pretrained

Process data

Run evaluation

DP export-sc

DP train

DP profile

Pycytominer

DeepProfiler Aggregate

Cytominer-eval

IMAGING

PLATFORM

14 of 25

Hands-on time after the break

Download example LINCS data
Familiarize yourself with the DeepProfiler structure and files
Prepare the images with DP
Profile the data with either pre-trained or self-trained model
Aggregate the profiles
Be happy

IMAGING

PLATFORM

15 of 25

10 min BREAK

IMAGING

PLATFORM

16 of 25

Hands on time!

Download example LINCS data
Familiarize yourself with the DP structure and files
Run Docker container and mount the example data
Prepare the images with DP
Profile the data with either pre-trained or self-trained model
Aggregate the profiles
Extra: Visualize your results
Don’t Panic

Find yourself in groups of two - command line “expert” with “non-expert”
Use Breakout rooms online
All Links are on the Helper Page
Ask questions to Michael / Niranj / use Slack channel
Log all problems into the Error log doc!
Comment on any issues and ideas in the Wiki
The Helper Page also provides expected output
If you get really stuck, I can give you the correct data

IMAGING

PLATFORM

17 of 25

The Profiling pipeline

The full profiling pipeline with DeepProfiler looks like the following:

Acquire images

Extract cell locations (with CellProfiler)

Run inference

Export single-cell crops to prepare for training

Compress images and perform illumination correction

Move data to DeepProfiler format

DP prepare

Aggregate onto well level

Train model on data

Pretrained

Process data

Run evaluation

DP export-sc

DP train

DP profile

Pycytominer

DeepProfiler Aggregate

Cytominer-eval

IMAGING

PLATFORM

18 of 25

Common mistakes!

Images are not found in location files and vice versa
Location coordinates are incorrect
CUDA

IMAGING

PLATFORM

19 of 25

Break time!

IMAGING

PLATFORM

20 of 25

Feedback from hands-on

Group by group:

Go through error log
Go through Wiki comments
General feedback

Can someone document in the Error Log?

IMAGING

PLATFORM

21 of 25

Discussion on Distributed DP

Speeds and resource requirements

Preparing: ~4 CPU h/plate

Profiling:

P3 GPU (NVIDIA Tesla V100): ∼ 2.5 GPU hours /plate
NVIDIA A100 : 0.25 GPU hours/plate

Aggregation: ~ 0.05 CPU hours/plate

---

CellProfiler: 288 CPU hours/plate = $10 (vs $0.4)

To consider

Acquiring location files
Distributed prepare (CPU)
Distributed Profiling (GPU)
Distributed Aggregate (CPU)

Storage

Images: ~30 GB per plate

Compressed images: ~6 GB per plate

Profiles: 3 GB per plate

Can someone document in the Error Log?

IMAGING

PLATFORM

22 of 25

Architecture

IMAGING

PLATFORM

23 of 25

Future outlook (with Juan)

Manuscript writing
Extended documentation writing
And more...

Document in the Error Log?

IMAGING

PLATFORM

24 of 25

Training models

We can talk about:

LINCS data
Differences between pre-trained and CP profiles
Training experiments
Batch effects
Overfitting models
Regularization techniques
Using self-trained models vs pre-trained models

IMAGING

PLATFORM

25 of 25

Weakly Supervised Representation Learning

softmax

CNN

Main goal:

Population-level profiling

Auxiliary task:

Single-cell treatment classification

should look similar

should look different

Drug A

Drug B

should look similar

Caicedo, et al. 2018 CVPR