1 of 20

2 of 20

Relative performance of customized and universal probe sets in target enrichment: A case study in subtribe Malinae

Article of: Roman Ufimov, Vojtěch Zeisek, Soňa Píšová, William J. Baker, Tomáš Fér, Marcela van Loo, Christoph Dobeš and Roswitha Schmickl

3 of 20

Introduction: What was know before this publication?

  • Universal probes
    • more consistent target locus recovery for outgroup taxa
  • Customized probes
    • higher total number of target loci
    • higher locus recovery in the ingroup

  • Angiosperms353 x custom probes
    • similar degree of phylogenetic informativeness

Angiosperms353, Source: https://arborbiosci.com

4 of 20

Introduction: Why was this study necessary?

  • Angiosperms353 markers
    • Utility has not yet been fully evaluated
    • No attempt to optimize it for target enrichment in the data analysis
  • Lack of studies on this topic
    • One study - comparison of Angiosperms353 x custome probes (Larridon et al., 2020)
    • Two studies - comparison of universal probes x custom probes (Kadlec et al., 2017; Chau et al., 2018)

5 of 20

Introduction: What were the aims of study?

  • Evaluation of the strengths and weaknesses of Angiosperms353
  • Discovering phylogenetic utility of Angiosperms353:
  • in comparison with Malinae481
  • comparison with optimized Angiosperms353

Focused on the genus Crataegus L.

Source: Xiao Du et al. (2019), Frontiers in Plant Science 10:1-12

6 of 20

Methods: Sampling and wet-lab

  • The subtribe Malinae (up to 30 genera+10 hybrid) appears to be monophyletic. Level of divergence between the genera is low.
  • Selection of 25 species within the Malinae: 13 from Crataegus, 12 from 7 other genera and Prunus tenella (Amygdaleae) as an outgroup.
  • The libraries were prepared using the NEBNext Ultra DNA Library Prep protocol and hybridization was done using MyBaits biotinylated RNA baits.
  • Target-enriched libraries were mixed with unenriched libraries for better plastome recovery

7 of 20

Methods: Processing nuclear data

  • Trimmed and deduplicated reads were analysed with HybPiper (assembly approach) and HybPhyloMaker (full-pipeline; reference-guided mapping approach)
  • HybPiper output was aligned with MAFFT and trimmed with custom R script.
  • Individual gene trees were inferred with IQ-Tree according to best-fitting model rated by ModelFinder
  • Species tree was reconstructed with ASTRAL and support calculated with PhyParts
  • Comparison was made between two HybPiper runs and HybPhyloMaker

8 of 20

Methods: Processing plastid data

  • HybPhyloMaker was used to map preprocessed reads to the modified plastome reference of M. angustifolia (GenBank). Kindel was then used for consensus calling (min. read depth 2x, majority rule 51 %)
  • Number of recovered plastid reads were compared for both probe sets, reads from both sets were then used to infer phylogeny:
  • The sequences were aligned using MAFFT, concatenated and partitioned with AMAS, so that each partition included either coding sequences or non-coding sequences
  • The plastome tree was inferred using RAxML-NG, using best-fit model for each partition according to ModelTest-NG results

9 of 20

Probe and reference sets

  • Two probe sets and four references were used: Angiosperm353 and custom Malinae481
  • Probes were designed by:
    1. BLASTing Malus domestica “Golden Delicious” mRNAs against M. domestica genome and a two-step filtering.
    2. 1280 filtered mRNAs were then BLASTed against Pyrus communis genome to find common mRNAs and filtered with the two-step filtering.
    3. Exons and introns were then extracted and separated. Introns underwent yet another filtering
  • Two reference sets correspond to probe sets: Angiosperms353 and Malinae481. The other two correspond to finetuned versions of Angiosperm353

10 of 20

Results: Comparison of Angiosperms353 with Malinae481

Both show a decrease in the number of loci, but differently!!!! → Most striking difference in the performance of the probes.

11 of 20

Higher target length in ingroup for Malinae481 and lower for Angiosperms353

12 of 20

Table 3: Alignment characteristics

!

!

13 of 20

Figure 2: Alignment length and proportion of PI sites

14 of 20

Results: Bioinformatic optimization of the Angiosperms353 reference for the Malinae (Malinae-optimized)

Malinae-optimized reference performed similarly to Angiosperms353, but nº of loci decreased slower in Malinae-optimized

15 of 20

Similar target lengths for both ingroup and outgroup

16 of 20

Table 3: Alignment characteristics

17 of 20

18 of 20

19 of 20

20 of 20