A hitchhiker's Guide to DeepProfiler
Jamboree day 11/19/21
Michael Bornholdt
IMAGING
PLATFORM
Todays Workshop
IMAGING
PLATFORM
Mandatory intro to Profiling
IMAGING
PLATFORM
My last 6 months
Project in one line:
“I designed a protocol for robustly and scalably quantifying cellular states from images using deep neural networks; the JUMP consortium will use this protocol to profile >2B cells to generate a dataset that will be made public in 2022”
IMAGING
PLATFORM
Motivation
Why do we want to use DeepProfiler over CellProfiler
Background of DeepProfiler
IMAGING
PLATFORM
Introduction to DeepProfiler
Topics
IMAGING
PLATFORM
Installing DP
Packages:
"beautifulsoup4>=4.6",
"click>=6.7",
"comet_ml>=1.0",
"efficientnet==1.1.1",
"gpyopt>=1.2",
"lxml>=4.2",
"numpy>=1.13",
"pandas>=0.23.0",
"scikit-image>=0.14.0",
"scikit-learn>=0.19.0",
"scipy>=1.1",
"comet-ml>=3.1.6",
"tensorflow==2.5.*",
"tensorflow_addons",
"tqdm>=4.62",
Requires: Tensorflow 2.5
Make sure the correct CUDA and cuDNN versions are installed and the paths are set
IMAGING
PLATFORM
Input / Output of DP
IMAGING
PLATFORM
Images and location files
Location files:
Images:
IMAGING
PLATFORM
Index and Config
df.columns
Index(['Metadata_Plate', 'Metadata_Well', 'Metadata_Site', 'Metadata_broad_sample', 'Metadata_moa', 'Metadata_mmoles_per_liter', 'Metadata_dose_recode', 'RNA', 'ER', 'AGP', 'Mito', 'DNA', 'Concentration', 'Treatment_ID', 'Compound', 'pert_iname', 'Treatment_Replicate', 'Treatment', 'Plate_Map_Name', 'Split'], dtype='object')
"train": {
"partition": {
"targets": [
"Compound"
],
"split_field": "Split",
"training_values": [0],
"validation_values": [1] },
"model": {
"name": "efficientnet",
"augmentations": false,
"crop_generator": "sampled_crop_generator",
"metrics": ["accuracy", "top_k"],
"epochs": 20,
"initialization":"ImageNet",
"params": {
"learning_rate": 0.02,
"batch_size": 256,
"conv_blocks": 0,
"label_smoothing": 0.0,
"feature_dim": 256,
"pooling": "avg"
The most important file in DP!
This file contains information about the experiment, with the minimum columns necessary to do sampling and run learning algorithms.
The configuration file is a text file in JSON format that organizes various settings for one experiment.
IMAGING
PLATFORM
Index file
Must have:
How do we create and manage these index files?
IMAGING
PLATFORM
Command line
python3 deepprofiler --root=/home/ubuntu/project/ --config filename.json train
python3 deepprofiler --root=/home/ubuntu/project/ --config filename.json profile
Options:
--root PATH Root directory for DeepProfiler experiment
--config TEXT Path to existing config file
--cores INTEGER Number of CPU cores for parallel processing (all=0)
--gpu TEXT GPU device id
--exp TEXT Name of experiment
--single-cells TEXT Name of single cell export folder
--metadata TEXT Name of metadata file, default index.csv
--logging TEXT Path to file with comet.ml API key
--help Show this message and exit.
IMAGING
PLATFORM
The Profiling pipeline
The full profiling pipeline with DeepProfiler looks like the following:
Acquire images
Extract cell locations (with CellProfiler)
Run inference
Export single-cell crops to prepare for training
Compress images and perform illumination correction
Move data to DeepProfiler format
DP prepare
Aggregate onto well level
Train model on data
Pretrained
Process data
Run evaluation
DP export-sc
DP train
DP profile
Pycytominer
Pycytominer
DeepProfiler Aggregate
Cytominer-eval
IMAGING
PLATFORM
Hands-on time after the break
IMAGING
PLATFORM
10 min BREAK
IMAGING
PLATFORM
Hands on time!
IMAGING
PLATFORM
The Profiling pipeline
The full profiling pipeline with DeepProfiler looks like the following:
Acquire images
Extract cell locations (with CellProfiler)
Run inference
Export single-cell crops to prepare for training
Compress images and perform illumination correction
Move data to DeepProfiler format
DP prepare
Aggregate onto well level
Train model on data
Pretrained
Process data
Run evaluation
DP export-sc
DP train
DP profile
Pycytominer
Pycytominer
DeepProfiler Aggregate
Cytominer-eval
IMAGING
PLATFORM
Common mistakes!
IMAGING
PLATFORM
Break time!
IMAGING
PLATFORM
Feedback from hands-on
Group by group:
Can someone document in the Error Log?
IMAGING
PLATFORM
Discussion on Distributed DP
Speeds and resource requirements
Preparing: ~4 CPU h/plate
Profiling:
Aggregation: ~ 0.05 CPU hours/plate
---
CellProfiler: 288 CPU hours/plate = $10 (vs $0.4)
To consider
Storage
Images: ~30 GB per plate
Compressed images: ~6 GB per plate
Profiles: 3 GB per plate
Can someone document in the Error Log?
IMAGING
PLATFORM
Architecture
IMAGING
PLATFORM
Future outlook (with Juan)
Document in the Error Log?
IMAGING
PLATFORM
Training models
We can talk about:
IMAGING
PLATFORM
Weakly Supervised Representation Learning
softmax
CNN
Main goal:
Population-level profiling
Auxiliary task:
Single-cell treatment classification
should look similar
should look different
Drug A
Drug B
should look similar
Caicedo, et al. 2018 CVPR