1 of 29

Linkage and phasing in polyploids�

Marcelo Mollinari

mmollin@ncsu.edu

SCRI Workshop – January 14, 2021

This work is licensed under CC BY 4.0

2 of 29

Introduction

  • Genetic linkage is the phenomenon where markers are likely to be inherited together.
  • The closer the markers are, the lower the probability of crossing over events occur between them; consequently, the more likely they will be co-inherited.

Linked

Not linked

Not linked

  • How can we measure how likely A and B are co-inherited?

3 of 29

Linkage analysis

  • We measure linkage using the recombination frequency (or fraction) in a segregation population.
  • Genetic linkage is a concept applied to at least two loci.
  • Recombination fraction is the probability that an odd number of crossovers occurs between the markers. Ranges from 0.0 to 0.5 (considering double reduction this number can be higher)
  • We can transform these probabilistic values into distances using mapping functions. (Morgan, Haldane, Kosambi, etc.)
  • By computing the recombination frequencies between pairs of markers and using mapping functions, we can construct linkage maps which show the linear order and relative distance between adjacent markers.
  • First, let us address the behavior o a single loci when transited across generations.

4 of 29

Gamete formation in polyploids*

*random pairing and no double reduction

5 of 29

Segregation in polyploids*

Multiallelic

Biallelic

Diploid

Tetraploid

Hexaploid

*random pairing and no double reduction

6 of 29

Recombination fraction in diploids

7 of 29

Recombination fraction in diploids - Likelihood

where n is the number of individuals. The maximum likelihood estimator of r is

8 of 29

Recombination fraction in diploids

Toy example

Computing recombination frequencies in diploids using R and C++

https://github.com/mmollina/Cpp_and_R

9 of 29

Gamete formation in polyploids* �

*no double reduction

10 of 29

Expected gametic frequency given a bivalent configuration

l : known number of recombinant bivalents between loci A and B

11 of 29

Unconditional gametic probability

Mollinari and Garcia (2019) doi:10.1534/g3.119.400378

12 of 29

Recombination Fraction – autotetraploid

Fully informative marker

13 of 29

Recombination Fraction – autotetraploid

Partially informative marker – Duplex/simplex – Association

14 of 29

Recombination Fraction – autotetraploid

Partially informative marker – Duplex/simplex – Repulsion

15 of 29

Recombination Fraction – assessing linkage phases

  • Pairwise MLE of r are used to group markers into linkage groups and order markers within each linkage group using optimization algorithms such as MDS
  • Given a sequence of ordered markers, it is possible to extend the idea of comparing likelihoods of competing linkage phases throughout multiple markers

16 of 29

Multidimensional Scaling Algorithm (MDS)

  • Reduce data from many dimensions preserving the observed distances between points by minimizing a loss function L.

Sweetpotato linkage group 1: 2745 markers

17 of 29

Haplotyping in polyploids

  • Disposition of allelic variants in the homologs in a homology group

dosages

18 of 29

Multilocus linkage analysis in polyploids

Mollinari and Garcia (2019) doi:10.1534/g3.119.400378

Leach et al. (2010) doi:10.1073/pnas.0908477107

where rk is the recombination frequency between loci k and k+1, p is the ploidy level and l is the number of recombinant events between k and k+1.

19 of 29

Markov model

20 of 29

Hidden Markov Model - HMM

21 of 29

Hidden Markov Model - HMM

Assessing different linkage phases using multilocus analysis

22 of 29

Hidden Markov Model - HMM

Individual 64:

23 of 29

Hidden Markov Model - HMM

  • Tetraploid example, one individual, 15 markers

1 1 1 3 1 1 2 2 0 1 2 1 1 0 1

24 of 29

Haplotype phasing – MAPpoly strategy

  • Step 1: Use of two-point information to reduce the search space

  • Step 2: Evaluate the remaining configurations, using HMM likelihood

25 of 29

Sweetpotato genetic map

Mollinari et al. (2020)

26 of 29

Sweetpotato genetic map

27 of 29

Probabilistic haplotype reconstruction

  • When assuming a prior probability distribution of the genotypes, multilocus strategies can improve the quality of the inferred haplotypes

Tetraploid potato

28 of 29

Probabilistic haplotype reconstruction

Hexaploid sweetpotato

29 of 29

References

  • Haldane, J. Theoretical Genetics of Autopolyploids. J. Genet. 22, 359–372 (1930).
  • Mather, K. Reductional and equational separation of the chromosomes in bivalents and multivalents. J. Genet. 30, 53–78 (1935).
  • Mather, K. Segregation and linkage in autotetraploids. J. Genet. 32, 287–314 (1936).
  • Mather, K. The measurement of linkage in heredity. (1938).
  • Fisher, R. A. The theory of linkage in polysomic inheritance. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 233, 55–87 (1947).
  • Lander, E. S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. U. S. A. 84, 2363–2367 (1987).
  • Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
  • Ripol, M. I., Churchill, G. A., Silva, J. A. G. Da & Sorrells, M. Statistical aspects of genetic mapping in autopolyploids. Gene 235, 31–41 (1999).
  • Luo, Z. W., Zhang, R. M. & Kearsey, M. J. Theoretical basis for genetic linkage analysis in autotetraploid species. Proc. Natl. Acad. Sci. U. S. A. 101, 7040–7045 (2004).
  • Leach, L. J., Wang, L., Kearsey, M. J. & Luo, Z. Multilocus tetrasomic linkage analysis using hidden Markov chain model. Proc. Natl. Acad. Sci. U. S. A. 107, 4270–4274 (2010).
  • Hackett, C. a., McLean, K. & Bryan, G. J. Linkage Analysis and QTL Mapping Using SNP Dosage Data in a Tetraploid Potato Mapping Population. PLoS One 8, (2013).
  • Zheng, C. et al. Probabilistic Multilocus Haplotype Reconstruction in Outcrossing Tetraploids. Genetics 203, 119–131 (2016).
  • Bourke, P. M. Genetic mapping in polyploids. (Wageningen University, 2018).
  • Mollinari, M. & Garcia, A. A. F. Linkage Analysis and Haplotype Phasing in Experimental Autopolyploid Populations with High Ploidy Level Using Hidden Markov Models. G3 9, 3297–3314 (2019).