1 of 25

A hitchhiker's Guide to DeepProfiler

Jamboree day 11/19/21

Michael Bornholdt

IMAGING

PLATFORM

2 of 25

Todays Workshop

  • 12pm: Intro
  • DeepProfiler basics
  • Profiling pipeline
  • Break
  • 13 pm: Hands on time!
  • Break
  • Profiling scale up / Distributed DP
  • 14.30 pm: Future outlook (Juan)
  • Break
  • Training models

IMAGING

PLATFORM

3 of 25

Mandatory intro to Profiling

IMAGING

PLATFORM

4 of 25

My last 6 months

Project in one line:

“I designed a protocol for robustly and scalably quantifying cellular states from images using deep neural networks; the JUMP consortium will use this protocol to profile >2B cells to generate a dataset that will be made public in 2022”

IMAGING

PLATFORM

5 of 25

Motivation

Why do we want to use DeepProfiler over CellProfiler

  1. Performance (higher scores on evaluation metrics)
  2. Costs (its cheaper)
  3. Faster (GPU acceleration)
  4. Autonomous (no need for human input to tweak parameters)
  5. Less downstream processing (e.g., no normalization)
  6. Potential for models that are robust against batch correction

Background of DeepProfiler

  • Developed over the last years
  • Recent improvements in the last few months
  • Currently at version 0.3.0
  • Documentation and making it available to a wider audience is the current focus
  • I have worked with DP for the last months on my thesis

IMAGING

PLATFORM

6 of 25

Introduction to DeepProfiler

Topics

  1. Installing
  2. Folder Structure
  3. Images, Location files, Crops
  4. Config file, Index file
  5. Command line details

IMAGING

PLATFORM

7 of 25

Installing DP

Packages:

"beautifulsoup4>=4.6",

"click>=6.7",

"comet_ml>=1.0",

"efficientnet==1.1.1",

"gpyopt>=1.2",

"lxml>=4.2",

"numpy>=1.13",

"pandas>=0.23.0",

"scikit-image>=0.14.0",

"scikit-learn>=0.19.0",

"scipy>=1.1",

"comet-ml>=3.1.6",

"tensorflow==2.5.*",

"tensorflow_addons",

"tqdm>=4.62",

Requires: Tensorflow 2.5

Make sure the correct CUDA and cuDNN versions are installed and the paths are set

IMAGING

PLATFORM

8 of 25

Input / Output of DP

IMAGING

PLATFORM

9 of 25

Images and location files

Location files:

  • Hold cell centers
  • Come from CellProfiler

Images:

  • Compressed
  • X sites per well
  • N channels
  • Stored in a nested folder structure

IMAGING

PLATFORM

10 of 25

Index and Config

df.columns

Index(['Metadata_Plate', 'Metadata_Well', 'Metadata_Site', 'Metadata_broad_sample', 'Metadata_moa', 'Metadata_mmoles_per_liter', 'Metadata_dose_recode', 'RNA', 'ER', 'AGP', 'Mito', 'DNA', 'Concentration', 'Treatment_ID', 'Compound', 'pert_iname', 'Treatment_Replicate', 'Treatment', 'Plate_Map_Name', 'Split'], dtype='object')

"train": {

"partition": {

"targets": [

"Compound"

],

"split_field": "Split",

"training_values": [0],

"validation_values": [1] },

"model": {

"name": "efficientnet",

"augmentations": false,

"crop_generator": "sampled_crop_generator",

"metrics": ["accuracy", "top_k"],

"epochs": 20,

"initialization":"ImageNet",

"params": {

"learning_rate": 0.02,

"batch_size": 256,

"conv_blocks": 0,

"label_smoothing": 0.0,

"feature_dim": 256,

"pooling": "avg"

The most important file in DP!

This file contains information about the experiment, with the minimum columns necessary to do sampling and run learning algorithms.

The configuration file is a text file in JSON format that organizes various settings for one experiment.

IMAGING

PLATFORM

11 of 25

Index file

Must have:

  • Metadata_Plate
  • Metadata_Well
  • Metadata_Site
  • Plate_Map_Name
  • Channel_Name
  • Treatment
  • Replicate

How do we create and manage these index files?

IMAGING

PLATFORM

12 of 25

Command line

python3 deepprofiler --root=/home/ubuntu/project/ --config filename.json train

python3 deepprofiler --root=/home/ubuntu/project/ --config filename.json profile

Options:

--root PATH Root directory for DeepProfiler experiment

--config TEXT Path to existing config file

--cores INTEGER Number of CPU cores for parallel processing (all=0)

--gpu TEXT GPU device id

--exp TEXT Name of experiment

--single-cells TEXT Name of single cell export folder

--metadata TEXT Name of metadata file, default index.csv

--logging TEXT Path to file with comet.ml API key

--help Show this message and exit.

IMAGING

PLATFORM

13 of 25

The Profiling pipeline

The full profiling pipeline with DeepProfiler looks like the following:

Acquire images

Extract cell locations (with CellProfiler)

Run inference

Export single-cell crops to prepare for training

Compress images and perform illumination correction

Move data to DeepProfiler format

DP prepare

Aggregate onto well level

Train model on data

Pretrained

Process data

Run evaluation

DP export-sc

DP train

DP profile

Pycytominer

Pycytominer

DeepProfiler Aggregate

Cytominer-eval

IMAGING

PLATFORM

14 of 25

Hands-on time after the break

  • Download example LINCS data
  • Familiarize yourself with the DeepProfiler structure and files
  • Prepare the images with DP
  • Profile the data with either pre-trained or self-trained model
  • Aggregate the profiles
  • Be happy

IMAGING

PLATFORM

15 of 25

10 min BREAK

IMAGING

PLATFORM

16 of 25

Hands on time!

  • Download example LINCS data
  • Familiarize yourself with the DP structure and files
  • Run Docker container and mount the example data
  • Prepare the images with DP
  • Profile the data with either pre-trained or self-trained model
  • Aggregate the profiles
  • Extra: Visualize your results
  • Don’t Panic
  • Find yourself in groups of two - command line “expert” with “non-expert”
  • Use Breakout rooms online
  • All Links are on the Helper Page
  • Ask questions to Michael / Niranj / use Slack channel
  • Log all problems into the Error log doc!
  • Comment on any issues and ideas in the Wiki
  • The Helper Page also provides expected output
  • If you get really stuck, I can give you the correct data

IMAGING

PLATFORM

17 of 25

The Profiling pipeline

The full profiling pipeline with DeepProfiler looks like the following:

Acquire images

Extract cell locations (with CellProfiler)

Run inference

Export single-cell crops to prepare for training

Compress images and perform illumination correction

Move data to DeepProfiler format

DP prepare

Aggregate onto well level

Train model on data

Pretrained

Process data

Run evaluation

DP export-sc

DP train

DP profile

Pycytominer

Pycytominer

DeepProfiler Aggregate

Cytominer-eval

IMAGING

PLATFORM

18 of 25

Common mistakes!

  • Images are not found in location files and vice versa
  • Location coordinates are incorrect
  • CUDA

IMAGING

PLATFORM

19 of 25

Break time!

IMAGING

PLATFORM

20 of 25

Feedback from hands-on

Group by group:

  • Go through error log
  • Go through Wiki comments
  • General feedback

Can someone document in the Error Log?

IMAGING

PLATFORM

21 of 25

Discussion on Distributed DP

Speeds and resource requirements

Preparing: ~4 CPU h/plate

Profiling:

  • P3 GPU (NVIDIA Tesla V100): ∼ 2.5 GPU hours /plate
  • NVIDIA A100 : 0.25 GPU hours/plate

Aggregation: ~ 0.05 CPU hours/plate

---

CellProfiler: 288 CPU hours/plate = $10 (vs $0.4)

To consider

  1. Acquiring location files
  2. Distributed prepare (CPU)
  3. Distributed Profiling (GPU)
  4. Distributed Aggregate (CPU)

Storage

Images: ~30 GB per plate

Compressed images: ~6 GB per plate

Profiles: 3 GB per plate

Can someone document in the Error Log?

IMAGING

PLATFORM

22 of 25

Architecture

IMAGING

PLATFORM

23 of 25

Future outlook (with Juan)

  • Manuscript writing
  • Extended documentation writing
  • And more...

Document in the Error Log?

IMAGING

PLATFORM

24 of 25

Training models

We can talk about:

  • LINCS data
  • Differences between pre-trained and CP profiles
  • Training experiments
  • Batch effects
  • Overfitting models
  • Regularization techniques
  • Using self-trained models vs pre-trained models

IMAGING

PLATFORM

25 of 25

Weakly Supervised Representation Learning

softmax

CNN

Main goal:

Population-level profiling

Auxiliary task:

Single-cell treatment classification

should look similar

should look different

Drug A

Drug B

should look similar

Caicedo, et al. 2018 CVPR