1 of 77

Bridging the micro- and macroevolutionary levels in phylogenomics: Hyb-Seq solves relationships from populations to species and above

Tamara Villaverde, Elliot M. Gardner, Lisa Pokorny, Sanna Olsson, Mario Rincon-Barrado, Norman J. Wickett, Julia Molero, Ricarda Riina, Matthew G. Johnson and Isabel Sanmartın

https://nph.onlinelibrary.wiley.com/doi/10.1111/nph.15312

Richard Bačák

2 of 77

Introduction

Microevolution

genetic drift, mutation, migration
microevolutionary processes affect mostly individuals within populations
studies use multiple individuals per population and rely on repeated DNA or polymorphic molecular markers as they, provide more variability at the population level and can be used to detect recent admixture

Macroevolution

speciation, extinction, dispersal
macroevolutionary processes affect diversification at/above species level
studies often rely on DNA sequences from only one individual per species

High-throughput sequencing (HTS)

bridging the micro- and macroevolutionary gap in phylogenetics by scaling up the number of loci and individuals within populations and across species that can be sequenced at a reasonable cost

3 of 77

Aims

to test the utility of probes based on genomic resources from taxa not directly related to the focal group in providing phylogenetic resolution from within populations and species to above-species level
to assess the effect of extensive use of highly degraded DNA from herbarium material on capture success and potential bias in phylogenomic inference at different evolutionary scales
to test the ability of their designed probes to obtain off-target chloroplast sequences
to test the monophyly of Euphorbia balsamifera ssp. sepium and its phylogenetic relation ship within E. balsamifera and estimate lineage divergence times

4 of 77

Fig. 1 Morphology and geographic distribution of Euphorbia balsamifera. (a) Euphorbia balsamifera ssp. balsamifera from the coast of Western Sahara. (b) Cyathium of ssp. balsamifera from Tenerife (Canary Islands). (c) Euphorbia balsamifera ssp. sepium from inland populations in Western Sahara near the border with Mauritania. (d) Fruit and long narrow leaves typical of ssp. sepium. (e) Specimen of E. balsamifera ssp. adenensis from Oman. (f) Cyathium and apical leaves of ssp. adenensis. (g, h) Geographic distribution of Euphorbia balsamifera in the Canary Islands, North Africa, and Arabian Peninsula based on herbarium specimens: ssp. balsamifera (circles), ssp. sepium (triangles), and ssp. adenensis (diamonds); (h) Detailed distribution of ssp. balsamifera in the Canary Islands. Fuchsia, green and orange symbols represent sampled DNA specimens of each taxon included in the phylogeny shown in Fig. 3(a), whereas grey symbols (all shapes) represent herbarium records without genomic data. Photos by: (a–d) R. Riina; (e, f) J. J. Morawetz.

5 of 77

Methods

Sampling

165 samples → 121 amplified

Probe design and sequence capture

As genomic resources for the design of probes, were used the transcriptomes of two Euphorbia species (E. mesembryanthemifolia and E. pekinensis)
Probes were designed for 431 orthologous LCNGs
Sequencing libraries were prepared using the Illumina TruSeq Nano HT DNA Kit
Indexed samples were pooled
Enriched products were PCR amplified and purified using the QIAquick PCR purification kit

6 of 77

Methods

Data processing and phylogenetic analysis of targeted nuclear loci

Demultiplexed sequences were quality-filtered using TRIMMOMATIC
Poorly aligned regions were removed from alignments using TRIMAL v.1.2
The HYBPIPER pipeline was used to assemble loci
Summary statistics were obtained using SAMTOOLS v.1.8
Orthologous sequences from 428 nuclear loci containing only exons were aligned using MAFFT v.7.222
428 loci → 132 loci removed → 296 loci
296 exon were concatenated into a supermatrix and a phylogenetic tree was built by maximum likelihood (ML) after automatic model selection using ModelFinder in IQ-TREE v.1.4.2 and RAXML v.8 2.9 using the GTRCAT model with 200 fast bootstraps followed by slow ML optimization
Alternatively, methods that implement the multispecies coalescent models (MSC) was used (ASTRAL-II v.2.4.7.7)
New alignments and MLphylogenies were created with Practical Alignment using SATe and Transitivity
SVDQUARTETS was used to infer a species tree under the coalescent framework

7 of 77

Methods

Data processing and phylogenetic analysis of chloroplast (skimmed) data

To compare the nuclear and plastid phylogenetic signals, they recovered plastid DNA using the annotated plastome of Ricinnus communis and transferring its annotations to a draft plastome of Euphorbia esula, if similarity was 95% between the two plastomes, using the transfer annotations function in GENEIOUS v.9.1.7
The authors extracted coding sequence (CDS) regions from each gene and used HYBPIPER to extract exons sequences
The latter was analysed with RAxML applying GTRCAT and 200 fast bootstraps followed by slow ML optimization

Divergence time estimation

Divergence times were estimated in BEAST v.1.8 using the nuclear exon supermatrix
The dataset was run unpartitioned under the best-fitting substitution model estimated in IQ-TREE
Final analyses comprised Markov chain Monte Carlo (MCMC) searches run for 400 million generations, with samples logged every 40 000th generation.

8 of 77

Results - Gene-capture success

Designed baits were successful
Capture success was much higher for silica-dried material than for herbarium material
The proportion of parsimony-informative sites was generally higher for introns (13%) than in supercontigs (9%) and exons (8%)

9 of 77

Results - Phylogenetic estimation of gene tree-species tree from nuclear data

a, Maximum likelihood tree (296 concatenated exon loci,121 samples 486,878 bp) estimated in IQ-TREE

b, Species tree inferred with a multispecies coalescent (MSC) approach in ASTRAL-IIusing the 296-exon supermatrix ;SVDQUARTETS obtained the same tree exceptforthe position of E.noxia

c, Maximum likelihood tree obtained from 66 exon loci (63133 bp) from the chloroplast genome

100 bootstrap supports (BS) backbone and main clade relationships, except for the position of E. noxia (BS = 94)

10 of 77

Results - Phylogenetic estimation of gene tree-species tree and Divergence time estimation

Maximum clade credibility (MCC) tree showing Bayesian estimates of divergence times

a, pie charts show gene tree conflict at each node relative to the ASTRAL-II species tree as estimated by phyparts
b, Expanded sampling within subspecies of E.balsamifera showing population-level relationships

Analysis of plastid data

low number of plastid sequences for E. balsamifera s.l. and none for any of the other taxa was recovered
no relation ship was found between type of material and capture success

11 of 77

Results - Divergence time estimation

12 of 77

Discussion

Hyb-Seq as a tool to bridge the micro- and macro evolutionary gap phylogenomics

presented study proves that designed probes combined with target sequencing and genome skimming (Hyb-Seq), can be applied to solve evolutionary relationships from populations to species and above within the same tree
the output of Hyb-Seq are DNA sequences of targeted known loci allowed us to use the same molecular evolutionary models across phylogenetic levels, from intrapopulation to populations and species, thus effectively bridging the micro- and macroevolutionary gap

Solving relationships within an ancient continental disjunct lineage

results from the phylogenomic analyses strongly support the monophyly of three subspecies within E. balsamifera: adenensis, balsamifera and sepium
The inferred divergence times between subspecies of E. balsamifera and with species in sect. Balsamis are older (ranging from Late Miocene to Early Pliocene) than those obtained in previous studies

13 of 77

The sweet tabaiba or there and back again: phylogeographical history of the Macaronesian Euphorbia balsamifera

Mario Rincón-Barrado1,2,* , Tamara Villaverde3, , Manolo F. Perez4 , Isabel Sanmartín1,†,*, and Ricarda Riina1,†,*

2024

Bego

https://academic.oup.com/aob/article/133/5-6/883/7514051

14 of 77

Introduction

E. balsamifera, E. adenensis & E. sepium

Populations of E. balsamifera sampled

Rand Flora pattern

15 of 77

Introduction

SSC (stepping-stone colonization model) → Discarded because for Canary islands most plant groups arrived after the islands already existed
CIH (central island hub model)
SSH (surfing syngameon model) → SSC, extinction, recolonization from CI

16 of 77

Aims

(1) Test the persistence of relict populations of E. balsamifera in the continent (north-west Africa) vs. climatic extinction in north-west Africa and a posterior back-colonization to the continent hypothesis, by building a more robust population-level phylogeny of E. balsamifera and allies

(2) Discriminate among alternative colonization and diversification scenarios of E. balsamifera in the Canary Islands (the CIH and SSH models and the possibility of back-colonization), using CNNs and a large SNP dataset extracted from the target loci.

(3) Assessing the impact of degraded DNA in the Hyb-Seq technique and how this can affect the reconstruction of relationships at both micro- and macro-evolutionary levels.

17 of 77

Methods

Sampling

32 samples of E. balsamifera collected in silica gel from seven localities in the islands of Fuerteventura, Lanzarote, Tenerife, La Gomera, La Palma and El Hierro, and 15 samples collected from four localities along the coasts of Morocco and Western Sahara

Data processing

HybPiper 1 pipeline → final dataset of 298 gene matrices with exon information and a concatenated total length of 511 697 bp

298 matrices were analysed using two approaches:

1) Trees analysed with ASTRAL to estimate a species-level phylogenetic tree under the multispecies coalescent model

2) Phylogenetic tree was constructed using IQ-TREE

Convolutional neural network analysis

To test alternative colonization models of the Canary Islands → They considered each island as a single population

18 of 77

Results

19 of 77

Discusion

Back-colonization of Africa rather than relicts of mainland populations

(1) By increasing taxon sampling (adding field-collected samples from the continent and from the eastern islands), they recovered a position that was partly congruent with the IQ-TREE analysis of Villaverde et al. (2018): the continental samples from north-west Africa were placed within a clade that included individuals from Lanzarote and Fuerteventura. Moreover, this western–eastern topology was supported by all our analyses [IQ-TREE (Fig. 3), ASTRAL (Fig. 4), BEAST (Fig. 6) and SVDQuartets (Fig. 7)], with high support.

Therefore, their results reject the hypothesis that the north-west African populations of E. balsamifera are climatic relicts

20 of 77

Discussion

Macro- and micro-evolution in ‘sweet tabaiba’

(2) Phylogenetic relationships at the species level (Figs 3 and 4) were identical to those recovered by Villaverde et al. (2018).

The phylogenetic trees generated from the Hyb-Seq DNA sequences show a clear east/west division among the Canarian populations of E. balsamifera → This phylogeographical pattern fits better the CIH model, which predicts dispersal from the island of Tenerife towards the east and west of the archipelago

21 of 77

Discussion

Effect of the quality of herbarium material on Hyb-Seq efficiency at different evolutionary levels

(3) DNA sequences generated with Hyb-Seq from old herbarium samples might appear closely related and converge in the phylogenetic tree, not because of sequence similarity but because of the type and amount of damage sustained.

Artefacts, caused by low-quality DNA, are not always present and depend on the evolutionary level being tackled. For example, the herbarium samples of E. balsamifera, E. sepium and E. adenensis exhibited lower percentages of capture success. Despite that, all herbarium samples were correctly positioned within their respective species clades, and with high clade support in both studies.

In Villaverde et al. (2018), using only herbarium samples from the continental populations, obtained contradictory results between the multispecies coalescent analyses (ASTRAL and SVDQuartets) and the IQ-TREE concatenated approach. Here, they obtained same results

22 of 77

Conclusions

The study supports the efficiency of the Hyb-Seq technique to provide phylogenetic resolution at the micro- and macroevolutionary levels using freshly collected material and herbarium material, although the latter should be used with caution.

Also, their results support the Rand Flora hypothesis for the disjunct distribution of the Canarian E. balsamifera relative to its closest species (E. adenensis) on the opposite side of the Sahara Desert.

23 of 77

Phylogenetic relationships and the identification of

allopolyploidy in circumpolar Silene sect. Physolychnis

Anne‐Sophie Quatela, Patrik Cangren, Paola de Lima Ferreira, Yannick Woudstra, Andreas Zsoldos‐Skahjem, Christine D. Bacon, Hugo J. de Boer, Bengt Oxelman, 2025

https://doi.org/10.1002/ajb2.70051

Eliška Krtilová

24 of 77

Introduction

High proportion of allopolyploids in previously glaciated areas
Species complexes common in recently divergent lineages (ILS)

Species complex Silene sect. Physolychnis

S. uralensis - type species
Diploids and allopolyploids morphologically similar
4x to 10x

25 of 77

Aims of the study

Distinguishing diploids from allopolyploids in the Silene uralensis complex

genetic distances between phased alleles

Resolving the taxonomy of diploids of S. uralensis complex

26 of 77

Materials and methods

Sampling

75 specimens; herbarium material or silica-dried samples

year 1871-2016

Laboratory procedure

Library preparation: NEBNext Ultra II FS DNA Library Preparation Kit for Illumina

half volume reactions

PCR barcoding: NEBNext Multiplex Oligos for Illumina Dual Index Promers Set 1
Double-sided size selection: AMPure XP beads
Target enrichment: Silene-specific probe set

Quality control

Qubit, NanoDrop, TapeStation

27 of 77

Materials and methods

Bioinformatics

De novo contigs created with SPAdes v.3.15.2 using default parameters
Consensus sequences from the contigs aligned with the probe sequences using BLAST v.2.2.31
Locus ID retrieved with a custom bash script
Allele phasing of the ﬁnal sequences performed with WhatsHap v.1
Discarded loci when: no SNPs were found & when large portions (i.e., several hundred bp)

42 loci out of 48 were kept for further analyses

28 of 77

Materials and methods

Genetic distances between phased haplotypes

Expected bimodal distribution (allopolyploids having larger distance)
Calculated for each sample with a custom phyton script

Phylogenomic analysis

Only diploids with target recovery > 50%
56 samples, 42 loci
Multispecies coalescent model (STACEY version 1.1.1 implemented in BEAST version 2.7.3)
Unlinked loci

29 of 77

Results

30 of 77

Results

31 of 77

Conclusion

N American and Greenlandic samples form a strongly supported clade (PP = 1)
Average genetic distance between haplotypes higher in allopolyploids

Limitations

Evolutionarily young lineage
Species tree with diploids only
No testing for gene flow (no suitable method for bigger dataset)

32 of 77

From museum drawer to tree: Historical DNA

phylogenomics clarifies the systematics of

rare dung beetles (Coleoptera: Scarabaeinae)

from museum collections

Fernando Lopes, Nicole Gunter, Conrad P. D. T. Gillett, Giulio Montanaro, Michele Rossini, Federica Losacco, Gimo M. Daniel, Nicolas Straube, Sergei Tarasov

Karolina Mahlerová

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0309596

33 of 77

Introduction

Dung beetles (Coleoptera: Scarabaeinae)

Ecosystem services

Dry museum specimens + alcohol stored

NHML, MNHN, MZHF

Nesosisyphus rotundatus (1944): Holotype, potentially extinct.

Onychothecus (1985): Rare genus with no prior molecular data.

Helictopleurus spp. (2003–2010): Endemic Madagascar taxa.

Morphology based, few gene based, no genome wide data

Recover full phylogeny of this group + compare extraction methods

Broad coverage of biogeographical regions and evolutionary lineages.

Combines historical, archival, and fresh/alcohol-preserved specimens.

Potential problem – low yield + degradation due to storage and collection

34 of 77

Materials and methods

96 specimens

70 newly sequenced
26 retriever from GenBank

Scarabaeinae (true dung beetles) - 67 samples from 42 genera/subgenera.

Outgroup: 29 samples - scarab subfamilies + non-scarab beetles (e.g., Silphidae)

59 alcohol preserved -20 - across museums

40 genera
QIAamp DNA Micro Kit (QIAGEN)

11 historical

Dabney et al. (2013)
Rohland et al. (2004)

35 of 77

Materials and methods

Dual-indexed Illumina libraries - RAPiD Genomics (Gainesville, FL, USA)

Extracted DNA sonicated to length approx. 500 bp
Enrichment via Scarab 3Kv1 UCE probe set

Developed to target approx. 3000 loci in Scarabidae
Developed on available beetle genomes

Sequencing platform: Illumina NovaSeq 6000 (2x150 bp).

36 of 77

Data analysis pipeline

Demultiplexed, read clean

quality control

Assembly of reads into contigs

Extract UCE loci from contigs (Phyluce + Scarab 3Kv1 probe set)

extracting loci from GenBank

MAFFT alignment

Trimming of alignment

Symmetry test to remove gene trees that would violate phylogenetic assumption p ≤ 0.05

Dataset construction and defining completeness threshold (Partitioned and unpartitioned variants run for comparison)

ML species tree

37 of 77

Results

(A) Distribution of read length generated per sample, demonstrating that the density of shorter reads was generally higher from archival extractions

(B) Distribution of the number of UCE loci per sample, demonstrating that the greatest density of samples that generated large numbers of captured loci resulted from archival extractions

Shorter reads for dry museum specimens → Museum specimens, when processed correctly, yielded the highest density of samples with large numbers of recovered loci

38 of 77

Results

Gene trees were tested using symmetry tests in IQ-TREE2, and loci failing this test (p ≤ 0.05) were removed

Tested 70% and 50% completeness threshold

70% - more complete alignment but fewer loci (293 loci)

The species tree is based on 1497 UCE
Representing the 50% completeness dataset - a balance between resolution and missing data

Dry-preserved historical specimens (e.g. Nesosisyphus rotundatus, Onychothecus tridentigeris) are highlighted in bold

Hollow dots indicate fully supported nodes (bootstrap = 100).
Numbered nodes show bootstrap < 100

39 of 77

Repeat proliferation and partial endoreplication jointly shape the patterns of genome size evolution in orchids

Zuzana Chumová, Eliška Záveská, Petra Hloušková, Jan Ponert, Philipp-André Schmidt, Martin Čertner, Terezie Mandáková, Pavel Trávníček

Přemysl Baláž

40 of 77

Motivation

Pleurothallidinae is the largest subtribe of orchids with 5000+ species (16-20 % of orchid diversity) in Central and South Americas
Emergence of Pleurothallidinae dates back to a split of Farallon plate (20 Mya)
Single or even few markers fail more and more to resolve phylogeny as finer and more recent groups are inspected
But most importantly, large variation in genome size (216 Mbp - 5,5 Gbp) and GC-content (~26 % - ~43 %)
Chromosome counts known only for few species
What is adaptational potential of a partial endopolyploidy?

41 of 77

Partial endopolyploidy

Gene-rich parts of genome are duplicated (black), while gene-poor and repeats-rich portions of genomes (orange) are not replicated in differentiating cells

Whole genomes are duplicated in differentiating cells

42 of 77

43 of 77

Custom probe design and data analysis

Designed with Sondovac, transcriptome of Masdevallia yuangensis (1KP) and skimming data from Stelis pauciflora
Probeset for 4956 exons (1200 low copy genes)
HybPhyloMaker was used for analysis, only genes with 0 % missing species were selected for further analysis
Gene trees were constructed under GTR+GAMMA model in RAxML, species trees were reconstructed with Astral. Concatenation approach was also used
Dating both the species (RelTime, MEGA X) and plastome tree (BEAST)
Ancestral state reconstructions and phylogeny-aware regression via R packages

44 of 77

Results

Large variation in genome size (216 Mbp - 5,5 Gbp) and GC-content (~26 % - ~43 %)

Genome size is dependent on many factors, repeat content or type of endoreplication, almost no dependence on number of chromosomes
GC-content depended only on repeat content

12-90 chromosomes, ancestral reconstruction failed

45 of 77

Results

Genus-level species tree showed conflict with plastome tree:

Specklinia 2 shows hybridization
Dracula 2 shows hybridization (with Masdevallia)
“Clade D” shows conflict

Species-level species tree shows lack of internal support in several genera as well as conflict with the concatenated dataset tree

46 of 77

47 of 77

Dracula

48 of 77

Pleurothallis

49 of 77

Restrepia

50 of 77

Discussion topics

Paralogs - could they impact species tree topology?
Impact of incomplete taxon sampling 341 species out of 5000+ species
What method(s) to use to resolve poorly supported groups (Dracula, Pleurothallis, Restrepia)
How to account for hybridization/introgressions
Polyploids?

51 of 77

How deep the rabbit hole goes

Each genus should be analysed separately

Diploidizied polyploids?

70-chromosomes group vs rest of species with 36-48 chromosomes

52 of 77

A target enrichment high throughput sequencing system for characterization of BLV whole genome sequence, integration sites, clonality and host SNP
Nagaki Ohnuki1,7, Tomoko Kobayashi1,7, Misaki Matsuo2,3, Kohei Nishikaku1 , Kazuya Kusama4 , YasushiTorii1 , Yasuko Inagaki1 , Masatoshi Hori5 , Kazuhiko Imakawa6* & Yorifumi Satou2,3

Dimple

53 of 77

INTRODUCTION

Bovine leukemia virus (BLV) is an oncogenic retrovirus which induces malignant lymphoma termed enzootic bovine leukosis (EBL) after a long incubation period.

EBL is generally characterized by local and systemic tumor development following a long incubation period of several years after BLV infection

Only 5% of BLV infected cattle are thought to develop EBL

The BLV genome is reverse transcribed to double-stranded DNA and integrated into the host genome as a provirus.

genes coding regulatory protein

major genes coding structural proteins and enzymes

Oncogenic

54 of 77

OBJECTIVES

Aims of the Paper:

To investigate the complex factors involved in BLV-related disease progression� — including viral polymorphisms, integration sites (ISs) in the host genome, and clonal proliferation of infected B cells.�
To apply and evaluate a BLV-targeted DNA capture sequencing (DNA-capture-seq) method� — as a technique for comprehensive analysis of the integrated BLV provirus and host genome sequences.�
To identify and characterize BLV integration sites (ISs)� — by analyzing host genomic sequences adjacent to the integrated BLV using enriched DNA.�
To analyze BLV-infected B cell clonality� — by determining the frequency and abundance of ISs, which reflect the proliferation of specific infected clones.�
To detect host genetic variations (SNPs)� — using capture probes specific to host genes, and assess their potential role in disease or EBL progression.

55 of 77

Materials and methods

1)DNA probes- designed 145 probes to capture BLV genome and Host TNF alpha promoter region

2)Sample collection, genotype analysis(genotype 1, genotype 3)(Categorization- AS, PL).

3)Library prep(NEBNext Ultra II DNA Library Prep Kit and NEBNext Multiplex Oligos for Illumina )and targeted enrichment (biotin label dna probe and DNA libraries enriched for proviral sequences).

4)High throughput sequencing data analysis.

5)IS and proviral structure analysis with DNA‑seq data.

6)PCR amplification of viral‑host junctions.

7)Estimating B cell clone abundance to understand progression of disease like leukemia.

8)Estimates of evolutionary divergence between BLV genotypes( Av. no. of base diff per site between sequences)

9)Detection of positive selection sites to see evolution favour virus genome focused on genotype 3.

56 of 77

Results

Validation of BLV capture probe.(fig1a, 1b, 1c) -The sample generated 0.97 million raw reads of which 0.61 million reads (63.3%) mapped to the host genome (the Ovine Genome Assembly Oar_v4.0) and 0.1 million reads (10.2%) mapped to the target BLV reference genome (EF600696). The sequence reads mapped to the target BLV reference genome covered 100% of the BLV genome (Fig. 1B). The phylogenetic analysis revealed that the full genome consensus sequence reconstructed from mapped reads was clustered in the same clade as the previously reported FLK-BLV sequence (accession numbers: LC164083 andEF600696) (Fig. 1C).
BLV ISs in FLK‑BLV cell line.(fig.2a,2b)
Application of DNA‑capture‑seq to the study of BLV infected and EBL cattle.
Application of DNA‑capture‑seq to the study of asymptomatic (AS) and persistent lymphocytosis (PL) cattle.(3c,3d)
SNP detection of BLV infected cattle

57 of 77

Figure 1. Application of DNA-capture-seq to analyze BLV proviral sequences and ISs. (A)

Schematic diagram of the application of the target enrichment. (B) Visualization of sequence reads mapped to the FLK-BLV sequence (EF600696). NGS reads mapped to BLV are shown below the reference sequence. (C) Maximumlikelihood phylogenetic tree analysis of BLV whole genome sequences were generated with five newly obtained sequences (a sequence from FLK-BLV cell line indicated by filled square and four sequences from EBL tumor samples indicated by filled circle) together with 53 sequences from the GenBank database. The phylogenetic tree was generated and visualized using MEGA 7 with 1000 bootstrap replicates. The bar at the bottom of the figure

denotes the estimated number of amino acid substitutions per site, indicating genetic variation for the length of the scale.

Validation of BLV capture probe

58 of 77

2) BLV ISs in FLK‑BLV cell line

59 of 77

low OCI values suggest abundant representation of multiple clones

polyclonal populations

Monoclonal populations

3) Application of DNA‑capture‑seq to the study of asymptomatic (AS) and persistent lymphocytosis (PL) cattle &

Integration Site (IS) Analysis

60 of 77

4) Application of DNA-Capture-Seq to BLV-Infected and EBL Cattle

Phylogenetic studies classify BLV into 11 genotypes globally.�Genotype 1 is predominant; genotype 3 is rare, with only one full-length genotype 3 sequence reported previously (LC164084).

Forty-two amino acid substitutions unique to genotype 3 were identified .�
Notable substitutions included:�

82F, 133T, and 254L in the ENV protein at the conformational epitope, neutralization domain 2 (ND2), and linear epitope, respectively.�
80S in the Rex protein located at the nuclear export signal (NES).�
43K, 164P, and 185T in the Tax protein at the putative zinc finger and leucine-rich activation domains.�

While the functional effects of these substitutions remain unclear, they might contribute to the low prevalence of BLV genotype 3 in Japan.

61 of 77

5) SNP detection of BLV infected cattle-

A SNP at position -824 (G/A) in the TNF-α promoter was analyzed:�

EBL cattle: 75% had the G allele (associated with higher proviral load and lymphoma risk)�
AS and PL cattle: only 25% had the G allele�

This suggests a potential genetic predisposition to lymphoma development linked to this SNP�

62 of 77

63 of 77

Conclusion:

DNA-capture-seq proves effective for:�

Whole-genome analysis of BLV,�
Detection of genotype-specific mutations,�
Identification and validation of proviral integration sites (ISs),�
Estimation of clonal diversity in BLV-infected tumors.

64 of 77

Targeted Enrichment of Large Gene Families for Phylogenetic Inference: Phylogeny and Molecular Evolution of Photosynthesis Genes in the Portullugo Clade (Caryophyllales)

ABIGAIL J. MOORE, JURRIAAN M. DE VOS, LILLIAN P. HANCOCK, ERIC GOOLSBY, AND ERIKA J. EDWARDS

https://pubmed.ncbi.nlm.nih.gov/29029339/

Martina Omelková

65 of 77

Background and Study Goals

Including multigene families in a hybrid enrichment study of the “portullugo” (Caryophyllales), which includes nine major lineages (2200 species). �
Relationships among many of its major lineages remain unresolved. � ➤ One of the bigges problem is the relationship between the cacti, Portulaca, and Anacampserotaceae. �
Portullugo also harbors multiple origins of two plant metabolic pathways: C4 and CAM photosynthesis.�
They focused on the molecular evolution of genes coding for the major C3, C4, and CAM photosynthesis enzymes during evolutionary transitions between these metabolic pathways. � ➤ They included 19 major photosynthesis gene families in hybrid enrichment design.

66 of 77

Background and Study Goals

Hyb-Seq has been used primarily to capture single-copy nuclear genes that are easier to analyze phylogenetically.�
Authors aimed to test whether Hyb-Seq could be extended to capture large, multi-gene families, which pose additional challenges due to gene duplications and paralogy.�
The goal was to use this approach not only to reconstruct a robust phylogeny of the Portullugo clade but also to investigate the molecular evolution of genes involved in specialized photosynthetic pathways such as CAM and C₄ metabolism

CAM and C₄ photosynthesis evolved independently multiple times (classic examples of convergent evolution)�
Gene duplication provides the raw genetic material for evolutionary innovation ( one gene copy can retain its original function, while the other may neofunctionalize)�
In photosynthetic pathways, specific paralogs of key enzymes (e.g., PEPC, PPDK, NADP-ME) have different adaptations for CAM or C₄ roles.

67 of 77

Methods

The study sampled 74 species across the Portullugo clade, including families like Cactaceae, Portulacaceae, and Anacampserotaceae.�
Baits for targeted enrichment were designed for use across the portullugo based on analyses of eight previously sequenced transcriptomes from the Portulacineae (previous work) and four from its sister group Molluginaceae (from the 1000 Plants transcriptome sequencing project). �
MyBaits baits were designed from two sets of genes: �➤ 19 gene families that were known to be important in CAM and C4 photosynthesis�➤ and 52 other nuclear genes.

68 of 77

Data Processing

They designed a three-part bioinformatics pipeline to reconstruct gene sequences.�� ➤ Part I aimed to extract all relevant reads for each gene family and then assemble them into contigs. �

69 of 77

➤ Part II then constructed longer sequences from contigs and assigned them to particular paralogs within a gene family.

70 of 77

➤ Part III identified gene duplications within gene families,�extracted phylogenetically useful sets of orthologs, �and used them for phylogenetic analysis.

71 of 77

Results – Phylogenetic Relationships in Portullugo

72 of 77

Robust backbone phylogeny

Using 387 nuclear loci, the authors reconstructed a well-resolved phylogeny of the Portullugo clade.�
High bootstrap and concordance support across most nodes.�
Confirms previous hypotheses about relationships, but also clarifies ambiguous branches, especially near the base of Cactaceae.

73 of 77

Gene tree discordance

Significant gene tree conflict detected among loci, even with high-quality alignments.�
Likely caused by ➤ Incomplete lineage sorting, Hidden paralogy, Gene flow or introgression�
The ASTRAL coalescent approach and BUCKy concordance analyses helped distinguish reliable clades from areas of conflict.

To evaluate genomic support for relations among major clades of portullugo, they used Bayesian concordance analysis in BUCKy software ( its based on the posterior distribution of gene trees from analyses in MrBayes 3.2)

Bucky detect groups of genes supporting the same topology, while accounting for uncertainty in gene tree estimates� BUCKy requires each individual to be present in trees for all loci.

(BUCKy estimates the genomic support as a concordance factor (CF) for each relationship found across analyses of all individual loci )

In spite of this congruence, our BUCKy analyses revealed strong and significant discord among loci for these relationships, with roughly half of our sampled genome (mean CF 0.52) supporting ((A,P),C) and roughly 25% supporting either (A(P,C)) or (P(A,C))

It is important to note that this discord among individual gene trees is not derived from poorly supported topologies of individual loci. Due to the congruence and overall strong support for ((A,P)C) by multiple inference methods and alternative matrices, we tentatively accept this topology and present it as the best working hypothesis for ACP relationships.

74 of 77

Role of paralogs

Including multi-copy gene families improved resolution for some nodes, but also increased conflict in others.

75 of 77

Results – Gene Family Evolution and Photosynthetic Pathways

A secondary goal of their study was to design a bait enrichment scheme that would allow us to simultaneously build a large database of genes relevant to the evolution of C4 and CAM photosynthesis.�Their previous work on PEPC evolution in this lineage identified five Portulacineae-specific gene duplications within ppc-1E1, the major ppc paralog that is most often recruited into C4 function across eudicots

Phylogenetic tree of ppc-1E1 obtained via RAxML (phosphoenolpyruvate carboxylase) je klíčový enzym v alternativních formách fotosyntézy )�One of the major ppc paralogs, ppc-1E1, has undergone multiple rounds of duplication in ancestral Portulacineae, (5: a-e)

Amino acids specifically associated with C4 in grasses are in boldface, and the number of boldface aminoacids for a lineage are indicated with red, blue, gray, and purple horizontal bars. �Red bars indicate a C4 lineage, blue bars indicate a lineage with CAM activity, light blue bars indicate suspected CAM activity, purple indicates both C4 and CAM, and gray indicates a C3 lineage.

76 of 77

Summary- Photosynthetic Pathways

Multiple lineages within Portullugo evolved CAM photosynthesis independently.�➤ In each case, different paralogs of key CAM-related enzymes (e.g. PEPC) were recruited.�
Multiple PEPC paralogs were identified and phylogenetically analyzed.�➤ CAM-associated paralogs form distinct clades within families, indicating lineage-specific recruitment.�➤ Inferred expression shifts and possible functional divergence in CAM-specific paralogs.

77 of 77

Broader implications

This study shows that target enrichment is a powerful tool not only for species phylogeny, but also for understanding the molecular basis of complex trait evolution.�
Highlights the importance of analyzing gene families, not just single-copy genes, in evolutionary biology.�
These results demonstrate that combining phylogenomics and gene family analysis opens a window into both lineage relationships and adaptive molecular evolution.