Somatic Mutation from Separated Haplotypes (SMUSH)
HackSeq Team #8a
Amanjeev Sethi
Eric Zhao
Hua Ling
Patrick Marks (Project lead)
Peng Zhang
Samantha Kohli
The Problem
The Unphased Scenario
Matched Normal
25% Tumor
Difficult to decide if this variant is somatic solely from Tumor data
What a real somatic mutation looks like
25% Tumor
Alt-allele covers a subset of the haplotype -- not expected in germline
Matched Normal
Somatic Variant Concept
We model four scenarios:
Dataset and Results
Datasets
Validation Examples
Pick likely somatic mutations sites based on difference of minor allele frequency (MAF) of two haplotypes in tumor titration.
Figure: Spaghetti plot of 31 sites selected
Shown here is fraction of tumor genomes (X-axis) and min_dif (Y-axis). Each line indicates the same position across samples.
Tumor Fraction
Minor Allele Fraction Difference
Somatic Case:
Model evaluation
For the 31 selected sites, model predicted all as somatic mutations for samples with 25% or more of tumor genomes. And only one sample were misclassified with somatic mutation for the normal cell genomes.
Variant Call Type | 0% Tumor | 25% Tumor | 50% Tumor | 75% Tumor | 100% Tumor |
Germline | 30 | 0 | 0 | 0 | 0 |
Somatic | 1 | 31 | 31 | 31 | 31 |
Preliminary Results -- Filtering and FN issues to be addressed
Tumor Fraction
Non-Somatic Events
Somatic Events
Hap2 AF
Hap1 AF
REF
Germline HET
Germline HOM
Somatic
Future Work
More filtering, more evaluation
Refine models
Evaluate sensitivity and PPV and low limit of detection
Explore application in variants other than SNV
Limitations
Somatic Variant Calling Model
Problematic Case: Mixed Haplotype in Normal Sample
Matched Normal
25% Tumor
75% Tumor
Mixed haplotypes in germline sample generates FP somatic calls. Requires orthogonal filters
False Negatives -- Phasing Issues
Matched Normal
50% Tumor
Isolated somatic variant drives ‘false’ phasing -- somatic variants look like germline
Requires changes to phasing method / incorporation of phasing likelihoods