1 of 21

Benchmarking of Mutational Signature Assignment Tools

Xi (Sam) Wang, Dr. Marcos Díaz-Gay, Dr. Raviteja Vangara,

Dr. Ludmil B. Alexandrov

2 of 21

Somatic Mutations: changes in DNA

Figure from Wikipedia: somatic mutation

3 of 21

Somatic mutations are known to be generated by exposures

Alexandrov, L. B. et al. Nature 500, 415–21 (2013).

4 of 21

Patterns of mutations are linked to sources of DNA damage

Environmental exposures

Tobacco smoking or chewing

Failure in DNA replication or repair

Aberrant mismatch repair pathway

Normal cellular activities

Spontaneous deamination of methylated cytosines

Helleday, T., Eshtad, S. & Nik-Zainal, S. Nat. Rev. Genet. 15, 585–598 (2014).

5 of 21

Mutational Signature: defined by base substitutions and context

Six classes of single-base mutations

Reported by pyrimidine

Adding 5’ and 3’ adjacent bases

96 possibilities considering context

Tate, J. G. et al. Nucleic Acids Res. 47, D941–D947 (2018). | https://cancer.sanger.ac.uk/cosmic/signatures_v2

6 of 21

Signature Assignment

  • The MATRIX:

X = W x H

Mutation Matrix (given)

Signatures Matrix

(standard cosmic_v3)

Activities/Exposures Matrix

(WHAT WE WANT!)

Mutation Type

Sample 1

Sample 2

A[C>A]A

A[C>A]C

T[C>G]T

Mutation Type

SBS1

SBS2

A[C>A]A

A[C>A]C

T[C>G]T

Signatures

Sample 1

Sample 2

SBS1

SBS2

SBS60

=

X

7 of 21

In other words:

Mutational signature assignment is the process of finding the most contributing factors associated with an individual’s potential or existing cancer causes.

An accurate assignment means accurate discovery of causes of cancer!

8 of 21

Problem

There are no systematic benchmarks for all published mutational signature assignment tools.

🤷‍♀️🤷‍♂️

9 of 21

So, what do we do in this project: �Benchmark them all!

Current published signature assignment tools:

  • Our own tools at the Alexandrov Lab: SigProfilerSingleSample, SigProfilerExtractor (decomposition module)
  • 15 other tools using following algorithms:
    • NNLS (non-negative least squares)
    • Multiple linear regression
    • Quadratic programming
    • Cone projection
    • Bootstrapping

10 of 21

How do we benchmark?

  • Git! Automation scripts! Benchmarking performance scripts!
  • Qualitative & Quantitative Performance Analysis

11 of 21

Current Progress?

Qualitative & Quantitative

12 of 21

Qualitative Analysis Method

  • For the assignment results of each tool, we want:
    • True positive, true negative, false positive, false negative to get:
    • Average precision, sensitivity, and specificity across all scenarios per noise level
  • A yes or no questions:

* Sample shown the scenario 2 without noise ground truth activities vs. activity results from tool 03_deconstructSigs

vs.

Ground Truth

Assignment from 03_deconstructSigs

13 of 21

Qualitative Analysis Result: all scenarios

Greater than 0

Cluster 1: QPSig, SigsPack, SignatureEstimation_QP

Cluster 2: MutationalPatterns, YAPSA (normal), MutationalCone

14 of 21

Qualitative Analysis Result: all scenarios

Greater than 0

Greater than 1% TMB

15 of 21

A step forward: Qualitative Analysis Method

  • Simple!
  • Compare the assignment results of each tools vs. ground truth
  • By calculating:
    • Sum of absolute differences by TMB
      • percentage mutation mistakes detected by the tools
    • Average cos_sim scores
      • We are applying this on Activities matrix this time!

16 of 21

Sum of Absolute differences by TMB

17 of 21

Quantitative Analysis Result: sum of absolute differences by TMB

18 of 21

Qualitative Analysis Result: average cos_sim

19 of 21

Conclusion

  • Consistent patterns are shown across all analysis
  • All tools’ accuracies decrease with noise in the samples
  • Next step: need to do all the VCF tools and tools that require lots of time, finish run_time analysis, then possible finish benchmarking!

20 of 21

Future Impacts!

  • Know where to improve

  • Improve our own tool!

  • Improve accuracy -> clinically applicable -> happy patients!

21 of 21

Thank you! Questions?