1 of 15

Somatic Mutation from Separated Haplotypes (SMUSH)

HackSeq Team #8a

Amanjeev Sethi

Eric Zhao

Hua Ling

Patrick Marks (Project lead)

Peng Zhang

Samantha Kohli

2 of 15

The Problem

  • Tumor sequencing is increasingly common for molecular diagnosis of cancer
  • Matched normal sample is often not available due to cost & logistics
  • Comparison with matched normal is gold-standard for distinguishing somatic variants from germline background�
  • Can long-range phasing information help?

3 of 15

The Unphased Scenario

Matched Normal

25% Tumor

Difficult to decide if this variant is somatic solely from Tumor data

4 of 15

What a real somatic mutation looks like

25% Tumor

Alt-allele covers a subset of the haplotype -- not expected in germline

Matched Normal

5 of 15

Somatic Variant Concept

  • Assumptions:
    • Germline variants are pure within a haplotype
    • Somatic variants show up in a fraction (a) of a haplotype’s reads
    • Some error rate (e)�

We model four scenarios:

  • Reference (0/0)
  • Germline Het (0/1)
  • Germline Hom (1/1)
  • Somatic

6 of 15

Dataset and Results

7 of 15

Datasets

  • HCC1954 - A very well studied cancer cell line
  • Matched normal, pure tumor, and 3 mixtures between the two
  • Sequenced by 10X linked read technology

8 of 15

Validation Examples

Pick likely somatic mutations sites based on difference of minor allele frequency (MAF) of two haplotypes in tumor titration.

Figure: Spaghetti plot of 31 sites selected

Shown here is fraction of tumor genomes (X-axis) and min_dif (Y-axis). Each line indicates the same position across samples.

Tumor Fraction

Minor Allele Fraction Difference

Somatic Case:

9 of 15

Model evaluation

For the 31 selected sites, model predicted all as somatic mutations for samples with 25% or more of tumor genomes. And only one sample were misclassified with somatic mutation for the normal cell genomes.

Variant Call Type

0% Tumor

25% Tumor

50% Tumor

75% Tumor

100% Tumor

Germline

30

0

0

0

0

Somatic

1

31

31

31

31

10 of 15

Preliminary Results -- Filtering and FN issues to be addressed

Tumor Fraction

Non-Somatic Events

Somatic Events

Hap2 AF

Hap1 AF

REF

Germline HET

Germline HOM

Somatic

11 of 15

Future Work

More filtering, more evaluation

Refine models

Evaluate sensitivity and PPV and low limit of detection

Explore application in variants other than SNV

12 of 15

Limitations

13 of 15

Somatic Variant Calling Model

14 of 15

Problematic Case: Mixed Haplotype in Normal Sample

Matched Normal

25% Tumor

75% Tumor

Mixed haplotypes in germline sample generates FP somatic calls. Requires orthogonal filters

15 of 15

False Negatives -- Phasing Issues

Matched Normal

50% Tumor

Isolated somatic variant drives ‘false’ phasing -- somatic variants look like germline

Requires changes to phasing method / incorporation of phasing likelihoods