1 of 73

Hereditary Stratigraphy Methods for Phylogenetic Inference over Distributed EC Populations

June 1, 2023 @ GPTP

Matthew Andres Moreno

Ecology & Evolutionary Biology/Complex Systems

University of Michigan

@MorenoMatthewA

2 of 73

Phylogenetic Analysis in Evolutionary Computing

  • Anecdotal account of evolutionary history to better understand instances of evolutionary innovation (Lenski et al. 2003)
  • Detect ecological dynamics (Dolson and Ofria, 2018)
  • Characterize selection pressure (Hagstrom et al., 2004)
  • Study spatial aspects of evolutionary innovation (Dolson and Ofria, 2017)
  • Apply in online mechanisms for evolutionary computation (Lalejini, 2023)

@MorenoMatthewA

(Hernandez et al.)

3 of 73

perfect tracking

@MorenoMatthewA

4 of 73

serial perfect tracking: easy, efficient, & robust

@MorenoMatthewA

5 of 73

serial perfect tracking: easy, efficient, & robust

Parallel & distributed perfect tracking: complex, potentially fragile & expensive

6 of 73

serial perfect tracking: easy, efficient, & robust

Parallel & distributed perfect tracking: complex, potentially fragile & expensive

Bio phylogenetic analysis through post-hoc inference is robust and decentralized.

@MorenoMatthewA

7 of 73

serial perfect tracking: easy, efficient, & robust

Parallel & distributed perfect tracking: complex, potentially fragile & expensive

Bio phylogenetic analysis through post-hoc inference is robust and decentralized.

🐶

🐰

@MorenoMatthewA

8 of 73

serial perfect tracking: easy, efficient, & robust

Parallel & distributed perfect tracking: complex, potentially fragile & expensive

Bio phylogenetic analysis through post-hoc inference is robust and decentralized.

🐶

🐰

infer

@MorenoMatthewA

🐶

🐰

9 of 73

serial perfect tracking: easy, efficient, & robust

Parallel & distributed perfect tracking: complex, potentially fragile & expensive

Bio phylogenetic analysis through post-hoc inference is robust and decentralized.

🐶

🐰

infer

Research Question:

How to design genomes to maximize phylogenetic reconstructability?

@MorenoMatthewA

🐶

🐰

10 of 73

serial perfect tracking: easy, efficient, & robust

Parallel & distributed perfect tracking: complex, potentially fragile & expensive

Bio phylogenetic analysis through post-hoc inference is robust and decentralized.

🐶

🐰

infer

Research Question:

How to design genomes to maximize phylogenetic reconstructability?

@MorenoMatthewA

new methodology & plug-’n’-play software tools

🐶

🐰

11 of 73

@MorenoMatthewA

Talk structure

12 of 73

goal

@MorenoMatthewA

Talk structure

focus on sexual populations

13 of 73

goal

@MorenoMatthewA

Talk structure

  1. reconstruct phylogenetic history
  2. estimate population sizes across history
  3. detect gene-level selection events

focus on sexual populations

14 of 73

goal

??? how

how

result

@MorenoMatthewA

Talk structure

✂️

  • reconstruct phylogenetic history
  • estimate population sizes across history
  • detect gene-level selection events

focus on sexual populations

15 of 73

genomes from

asexual population

phylogeny

goal

1

16 of 73

phylogeny

genomes from

asexual population

goal

1

17 of 73

phylogeny

genomes from

asexual population

goal

1

18 of 73

genomes from

asexual population

phylogeny

??? how

how

genome

19 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

genome

20 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

📌‏ ‎

21 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

📌‏ ‎

22 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

gen 1

📌‏ ‎

📌‏ ‎

📌‏ ‎

23 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

gen 1

gen 2

➔ ➔

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

24 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

gen 1

gen 2

gen 3

➔ ➔ ➔

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

25 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

gen 1

gen 2

gen 3

➔ ➔ ➔

📌‏ ‎

gen 3

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

26 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

gen 1

gen 2

gen 3

➔ ➔ ➔

📌‏ ‎

gen 3

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

27 of 73

genomes from

asexual population

phylogeny

??? how

how

📌‏ ‎

instrumentation

gen 0

gen 1

gen 2

gen 3

➔ ➔ ➔

📌‏ ‎

gen 3

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

📌‏ ‎

28 of 73

Instrumentation space vs. accuracy

??? how

how

29 of 73

Instrumentation space vs. accuracy

??? how

how

30 of 73

Instrumentation space vs. accuracy

??? how

how

31 of 73

Instrumentation space vs. accuracy

??? how

how

⬇️ pruning ⬇️

32 of 73

Instrumentation space vs. accuracy

??? how

how

Tradeoff: space vs MRCA estimate uncertainty

⬇️ pruning ⬇️

Uncertainty when estimating MRCA

33 of 73

Instrumentation space vs. accuracy

??? how

how

Tradeoff: space vs MRCA estimate uncertainty

⬇️ pruning ⬇️

Uncertainty when estimating MRCA

✂️

–– pocket slides —

34 of 73

Instrumentation space vs. accuracy

??? how

how

Tradeoff: space vs MRCA estimate uncertainty

⬇️ pruning ⬇️

Uncertainty when estimating MRCA

✂️

–– pocket slides —

(happy to say a little more in Q&A)

35 of 73

68 bytes/genome; 262,144 generations w/ pop size 32,768 leaves (100 subsample shown)

Example phylogeny reconstruction

result

36 of 73

genomes from

sexual population

phylogeny

goal

1

37 of 73

phylogeny

genomes from

sexual population

how

38 of 73

phylogeny

genomes from

sexual population

how

<

39 of 73

phylogeny

genomes from

sexual population

how

<

40 of 73

phylogeny

genomes from

sexual population

how

<

41 of 73

phylogeny

genomes from

sexual population

how

<

42 of 73

phylogeny

genomes from

sexual population

how

<

43 of 73

phylogeny

genomes from

sexual population

how

<

44 of 73

Example phylogenetic reconstruction

result

✂️

@MorenoMatthewA

45 of 73

genomes from

sexual population

historical population

size estimates

goal

2

46 of 73

genomes from

sexual population

historical population

size estimates

goal

2

47 of 73

genomes from

sexual population

historical population

size estimates

goal

2

48 of 73

genomes from

sexual population

historical population

size estimates

time

Population size

goal

2

49 of 73

genomes from

sexual population

historical population

size estimates

how

4 observations –> 95% CI spanning 8-fold magnitude

50 of 73

genomes from

sexual population

historical population

size estimates

how

max(🎲,🎲,🎲,🎲,🎲,🎲,🎲,🎲,🎲)

4 observations –> 95% CI spanning 8-fold magnitude

51 of 73

genomes from

sexual population

historical population

size estimates

how

max(🎲,🎲,🎲,🎲,🎲,🎲,🎲,🎲,🎲)

max(🎲,🎲,🎲)

vs

4 observations –> 95% CI spanning 8-fold magnitude

52 of 73

Example population size estimation

result

@MorenoMatthewA

53 of 73

detection of gene-level selection

goal 3

54 of 73

detection of gene-level selection

goal

3

55 of 73

detection of gene-level selection

goal 3

vs.

56 of 73

detection of gene-level selection

how

16 generations

Gen. n

✂️

–– pocket slides —

57 of 73

detection of gene-level selection

how

16 generations

Gen. n

📸.

✂️

–– pocket slides —

58 of 73

Detection of gene-level selection

result

@MorenoMatthewA

allele frequency

59 of 73

Detection of gene-level selection

result

@MorenoMatthewA

allele frequency

signal

60 of 73

Conclusion

From extant members of a distributed population,

  • reconstruct phylogenetic history (sexual & asexual)
  • estimate population sizes across history
  • detect gene-level selection events

@MorenoMatthewA

61 of 73

Conclusion

From extant members of a distributed population,

  • reconstruct phylogenetic history (sexual & asexual)
  • estimate population sizes across history
  • detect gene-level selection events

Methods & tools that may be useful in your system,

  • Python Library (PyPi) & C++ Library (soon)
  • Grab off the shelf or reach out & let’s collaborate!

@MorenoMatthewA

62 of 73

Acknowledgment

Collaborators

Advisors

Emily Dolson

Santiago Rodriguez Papa

Charles Ofria

Luis Zaman

@MorenoMatthewA

63 of 73

Bibliography

O'Neill, Bill. "Digital evolution." PLoS Biology 1.1 (2003): e18.

Dolson, Emily, and Charles Ofria. "Spatial resource heterogeneity creates local hotspots of evolutionary potential." ECAL 2017, the Fourteenth European Conference on Artificial Life. MIT Press, 2017.

Dolson, Emily, and Charles Ofria. "Ecological theory provides insights about evolutionary computation." Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2018.

Lalejini, Alexander et al. “Phylogeny-informed Lexicase Selection.” GPTP (2023).

Hagstrom, George I., et al. "Using Avida to test the effects of natural selection on phylogenetic reconstruction methods." Artificial life 10.2 (2004): 157-166.

Lenski, R. E., Ofria, C., Pennock, R. T., & Adami, C. (2003). The evolutionary origin of complex features. Nature, 423(6936), 139–144.

Kauffman, Stuart, and Simon Levin. "Towards a general theory of adaptive walks on rugged landscapes." Journal of theoretical Biology 128.1 (1987): 11-45.

@MorenoMatthewA

64 of 73

Images

Eric Gaba for Wikimedia Commons, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

David Abián, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

Jose Guadalupe Hernandez, Alexander Lalejini, and Emily Dolson. 2022. Phylogenetic diversity predicts future success in evolutionary computation. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '22). Association for Computing Machinery, New York, NY, USA, 23–24. https://doi.org/10.1145/3520304.3534079

@MorenoMatthewA

65 of 73

Questions?

@MorenoMatthewA

66 of 73

Pruning: distribution

more recent

more ancient

@MorenoMatthewA

67 of 73

Pruning: distribution

more recent

more ancient

⬆️ evenly pruned vs. recency-proportional pruned ⬇️

@MorenoMatthewA

68 of 73

Pruning: distribution

more recent

more ancient

⬆️ evenly pruned vs. recency-proportional pruned ⬇️

more retention

less retention

@MorenoMatthewA

69 of 73

435 bytes/genome; 262,144 generations w/ pop size 32,768 leaves (100 subsample shown)

Example phylogeny reconstruction

result

70 of 73

detection of gene-level selection

how

a

b

c

d

e

f

g

h

i

j

ab

cd

cde

📸.

16 generations

ij

hij

ghij

cdef

abcdefi

abcdefghij

i

Gen. n

≥10 copies

@ 16 gen

71 of 73

x

x

x

x

x

x

gen 0

gen 1

gen 2

gen 3

gen 4

gen 5

gen 6

gen 7

gen 8

➔ ➔ ➔ ➔ ➔ ➔ ➔ ➔

rank 0

rank 1

rank 2

rank 3

rank 4

rank 5

rank 6

rank 7

rank 8

72 of 73

record alloc:

recency-proportional

O(n )

__🗄️🗂️ oldest 🗂🗄️__

O(log n)

t=8

t=16

8 gens

t=8

t=16

8 gens

8 gens

t=8

t=16

8 gens

space

complexity

time –>

space –>

time –>

space –>

— — - upper bound - — —

time –>

__🗄️🗂️ oldest 🗂🗄️__

__🗄️🗂️ oldest 🗂🗄️__

__🗄️🗂️ oldest 🗂🗄️__

__🗄️🗂️ oldest 🗂🗄️__

__🗄️🗂️ oldest 🗂🗄️__

__🗄️🗂️ oldest 🗂🗄️__

__🗄️🗂️ oldest 🗂🗄️__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

__🌟✨ newest ✨🌟__

space –>

t=8

t=16

O(1)

record alloc: uniform

73 of 73