CanDLE:�Illuminating Biases in Transcriptomic Pan-Cancer Diagnosis
1
10/21/2023
Gabriel Mejía
Pablo Arbelaez
Natasha Bloch
Paper ID 8
2
10/21/2023
Why Molecular Cancer Diagnosis?
Cancer Prevalence and Diversity
19.2 M New Cases (2020)
9.9 M Deaths (2020)
�But Wait… �What Public Data Do We Have?
3
10/21/2023
4
10/21/2023
Public Databases
GTEx
The Cancer Genome Atlas
Genotype Tissue Expression Project
TCGA and GTEx Data Are NOT Comparable!!!
Reference Genome, Alignment and Quantification Algorithm Variability
5
10/21/2023
Efforts to Join the GTEx and the TCGA
Vivian et al. Dataset
Wang et al. Dataset
6
10/21/2023
Quinn et al. Tissue Detector
Hong et al. Multitask MLPs
Related Work
�So…�Problem Solved, Right?
7
10/21/2023
8
10/21/2023
TCGA
Translation Bias
GTEx
Linear SVM
Both Datasets Present Empirical Translation Biases
Previous Results Lose Clinical Relevance
Z-score Batch Standardization Corrects Translation Biases
1
Gene 1
Gene 3
Gene 2
Gene 1
Gene 3
Gene 2
9
10/21/2023
CanDLE: Cancer Diagnosis Logistic Engine
Which cancer/tissue?
Classification &
all-vs-one Detection
63 Neurons for Multilabel Classification
2 Neurons for All-vs-one Detection
Simplest Gradient Based Approach
…
SoftMax
0.2
0.0
1.2
-0.5
-0.1
Gene Expression Vector
…
0.01
0.0
0.7
0.2
0.05
…
Class Probability Vector
Multinomial Logistic Regression
Previous Findings for Representation Learning
[1] Smith, A., et al., 2020: Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. DOI: 10.1186/s12859-020-3427-8
10
10/21/2023
Experimental Setup
Random 60/20/20% Train/Val/Test Partition
11
10/21/2023
*
*
Main Results: Classification
CanDLE’s Simplicity Can Generalize Better With Removed Biases
*Reimplementation
State-of-the-Art Performance by +7.3% Balanced Accuracy in Test
[2] Hong, J., et al., 2022: A deep learning model to classify neoplastic state and tissue origin from transcriptomic data. DOI: 10.1038/s41598-022-13665-5
Hong’s Method Takes Advantage of Translation Biases
12
10/21/2023
Main Results: All-Vs-One Detection
Generally Worst Performing Classes Have Low Training Samples
Comparable Performance With Respect to the State-of-the-art Method by Quinn et al.
[3] Quinn, T., et al., 2019: Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data. DOI: 10.3389/fgene.2019.00599
13
10/21/2023
Select Top 1,000 Genes in Absolut Value For Each Cancer Class
Order by Number of Times that a Gene Was Selected for a Class
1,982 Genes Important for at Least 3 Cancers Were in the Final List
Interpretation
Gene Ontology Biological Processes Enrichment Analysis
Interpretation
Developmental and Morphogenesis Pathways Were Over-Represented
14
10/21/2023
Take Home Messages
Code Availability
https://github.com/g27182818/CanDLE
Thank You�for Your Time!�Questions?
15
10/21/2023
Gabriel Mejía
gm.mejia@uniandes.edu.co
Pablo Arbelaez
pa.arbelaez@uniandes.edu.co
Natasha Bloch
n.blochm@uniandes.edu.co
Biomedical Computer Vision
CanDLE’s Code Availability