1 of 37

diaQTL: �QTL mapping in outbred tetraploid diallel populations

Jeffrey Endelman, University of Wisconsin-Madison

Tools for Genomics-Assisted Breeding in Polyploids

January 15, 2021

2 of 37

Joint Linkage Analysis & Selection in Autotetraploid Potato & Blueberry

USDA NIFA AFRI (2019–2023)

PolyOrigin for haplotype reconstruction

bioRxiv 2020.12.18.423519

diaQTL for QTL mapping

bioRxiv 2020.12.18.423479

J. Endelman

R. Amadeu

C. Zheng

P. Muñoz

3 of 37

By “diallel” we mean partial diallel

2006

4 of 37

diaQTL

  • R package, maintained on GitHub
  • Multi-allelic QTL
  • Basic workflow: QTL Discovery 🡪 Haplotype Effects
  • Fit models with additive, dominance and polygenic effects
  • Include fixed effects for year, location, etc.
  • Tools for visualizing and selecting on haplotypes

5 of 37

A.1

A.2

A.3

A.4

B.1

B.2

B.3

B.4

C.1

C.2

C.3

C.4

y1

1

1

1

1

y2

1

1

1

1

y3

1

1

1

1

y4

2

1

1

y5

1

1

1

1

Phenotype

Haplotype Dosage

A

B

C

3x3 half-diallel

Use regression to estimate additive effect for each haplotype

6 of 37

Additive

Effect

Output from fitQTL function

Each haplotype has an estimated effect, even though in reality,

# QTL alleles < # haplotypes

Can hypothesize that haplotypes with similar effect contain the same QTL allele

A

B

C

7 of 37

Dominance effects

  • Decomposed into digenic, trigenic, and quadrigenic components
    • Nice statistical properties (orthogonal effects)
    • Higher order terms tend to account for less variance

Effect

Predictor variable

Levels

(examples)

No. parameters in

3x3 half-diallel without selfing

additive

haplotype

A.1

A.2

B.1

12

digenic

diplotype

A.1+A.2

A.1+B.1

78

trigenic

triplotype

A.1+A.2+B.1

A.1+B.1+B.2

240

quadrigenic

tetratype

A.1+A.2+B.1+B.2

300

8 of 37

Dominance effects

  • Decomposed into digenic, trigenic, and quadrigenic components
    • Nice statistical properties (orthogonal effects)
    • Higher order terms tend to account for less variance
  • When referring to a particular model, it includes all lower order effects

Model

Effects

additive

a

digenic

a + d

trigenic

a + d + t

quadrigenic

a + d + t + q

9 of 37

Complete dominance

  •  

10 of 37

Output from fitQTL

Above diagonal: digenic effects

Below diagonal: additive + digenic effects

 

 

In this Example:

C.4

C.3

C.2

C.1

B.4

B.3

B.2

B.1

A.4

A.3

A.2

A.1

C.4

C.3

C.2

C.1

B.4

B.3

B.2

B.1

A.4

A.3

A.2

A.1

11 of 37

QTL Discovery

scan1 returns LOD score at each marker

Markers with LOD score above the threshold are declared significant

Need to control genome-wide false positive rate

  • Permutation test
  • Simulations without any QTL

12 of 37

LOD threshold

Threshold increases with

  • number of parents
  • genome size

 

Simulated half-diallel populations

13 of 37

Statistical Power

 

Simulated half-diallel populations

h2

0.2

0.1

14 of 37

Accuracy of haplotype effect estimation also depends on pph

Accuracy = correlation between simulated and estimated effects

h2

0.2

0.1

15 of 37

How does it work?

  • Due to the large number of model parameters, QTL effects are modeled as random
  • To allow for non-normal distribution, Bayesian regression models originally developed for genomic selection are used
    • R package BGLR (Pérez and de los Campos 2014)
  • Need to specify number of iterations for the algorithm
    • Function set_params provides guidance
  • Model fit vs. complexity assessed using DIC (deviance information criterion)
    • DIC = -2 (log likelihood@ posterior mean) + 2 (effective # parameters)
    • Lower is better

16 of 37

Current version is 0.91

Consult NEWS file to see what changes have been made

17 of 37

Vignette dataset

3x3 half-diallel

W6511-1R

VillettaRose

W9914-1R

18 of 37

Input files

  • Three input files needed
    • Pedigree
    • Genotype
    • Phenotype

Generate from PolyOrigin output using read_polyancestry

19 of 37

20 of 37

21 of 37

 

22 of 37

specifies maximum possible dominance that will be fitted

dominance

Model

1

additive

2

digenic

3

trigenic

4

quadrigenic

23 of 37

24 of 37

25 of 37

26 of 37

r2 = squared correlation between predicted and observed response variable

deltaDIC = change in DIC relative to no-QTL model

27 of 37

28 of 37

Haplotype effects

29 of 37

Select the dominance model

30 of 37

Proportion of variance

 

31 of 37

32 of 37

Haplotype selection

A.1

A.2

A.3

A.4

B.1

B.2

B.3

B.4

C.1

C.2

C.3

C.4

id1

1

1

1

1

id2

1

1

1

1

id3

1

1

1

1

id4

2

1

1

id5

1

1

1

1

33 of 37

34 of 37

Multiple QTL mapping

  • In the absence of epistasis, multiple QTL on different chromosomes can be detected adequately with scan1
  • For epistatic QTL on different chromosomes, both scan1 and fitQTL have the option of including a marker as a cofactor in the analysis
  • To resolve multiple QTL on the same chromosome, the best approach is a two-dimensional scan (scan2), which is under development and will be available soon

35 of 37

Binary Traits: GLM with probit link function

id

LB-Resistant

W16215-100rus

N

W16215-101rus

Y

W16215-103rus

Y

W16215-105rus

Y

W16215-106rus

N

W16215-108rus

Y

W16215-109rus

Y

W16215-110rus

N

Code the trait as Y/N

Karki et al. (2021) doi:10.1101/2020.09.27.315812

QTL effects correspond to linear predictor of the GLM

36 of 37

Binary Traits

id

LB-Resistant

W16215-100rus

N

W16215-101rus

Y

W16215-103rus

Y

W16215-105rus

Y

W16215-106rus

N

W16215-108rus

Y

W16215-109rus

Y

W16215-110rus

N

Code the trait as Y/N

diaQTL function fine_map

37 of 37

Bayesian Credible Interval (CI)