Hereditary Stratigraphy:
Genome Annotations to Enable Phylogenetic Inference over
Distributed Digital Evolution Populations
Matthew Andres Moreno @MorenoMatthewA
Emily Dolson @emilyldolson
Charles Ofria @charlesofria
Evolution Conference
June, 2022
Digital Evolution
10011011110100
Computer models with evolving agents (O'Neill, 2003)
Idea: instantiate evolutionary processes in a computer program.
Why? Unique capabilities;
@MorenoMatthewA
Phylogenetic Analysis in Digital Evolution
Why?⏩
@MorenoMatthewA
Perfect Phylogenetic Tracking in Digital Evolution
death event
birth event
perfect record
@MorenoMatthewA
Scaling Up Perfect Tracking: Challenges
Scaling Up Perfect Tracking: Challenges
vs.
Scaling Up Perfect Tracking: Challenges
vs.
Scaling Up Perfect Tracking: Challenges
vs.
lost data
Alternate Strategy: Reconstruction
Alternate Approach: Reconstruction
extant
organisms
Alternate Approach: Reconstruction
extant
organisms
Alternate Approach: Reconstruction
✨ inference ✨
extant
organisms
Alternate Approach: Reconstruction
❗estimated❗
not ground truth
extant
organisms
Alternate Approach: Reconstruction
❗estimated❗
not ground truth
extant
organisms
in digital evolution, we control how these work
Research Question:
How to design a genomes to maximize phylogenetic reconstructability?
@MorenoMatthewA
A Naive Approach
@MorenoMatthewA
A Naive Approach
gen 0
100110111
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
100110111
100110011
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
100110111
100110011
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
100110111
100110011
110110011
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
100110111
100110011
110110011
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
100110111
100110011
110110011
110010011
110110010
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
100110111
100110011
110110011
110010011
110110010
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
gen 4
100110111
100110011
110110011
110010011
110000011
110110010
111110010
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
gen 4
100110111
100110011
110110011
110010011
110000011
110110010
111110010
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
100110111
100110011
110110011
110010011
110000011
110110010
111110010
111100010
010000011
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
100110111
100110011
110110011
110010011
110000011
110110010
111110010
111100010
010000011
extant
@MorenoMatthewA
A Naive Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
100110111
100110011
110110011
110010011
110000011
110110010
111110010
111100010
010000011
extant
extant
010000011
111100010
A Naive Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
100110111
100110011
110110011
110010011
110000011
110110010
111110010
111100010
010000011
extant
extant
010000011
111100010
measure distance => infer estimate of generations elapsed most recent common ancestor
Challenges:
A More Clever Approach
“Hereditary Stratigraph”
@MorenoMatthewA
A More Clever Approach
gen 0
@MorenoMatthewA
“Fingerprint”: randomly generated packet of data
A More Clever Approach
gen 0
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
gen 2
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
gen 4
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
extant
@MorenoMatthewA
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
extant
extant
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
extant
extant
different
same
A More Clever Approach
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
extant
extant
different
same
end of common ancestry
A More Clever Approach
“hereditary stratigraph”
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
…
@MorenoMatthewA
A More Clever Approach
“hereditary stratigraph”
gen 0
gen 1
gen 2
gen 3
gen 4
gen 5
attach as neutral
“annotation” to any digital genome
…
@MorenoMatthewA
arbitrary digital genome
Hangup: genome size bloat
If we add a 64 bit “fingerprint” to the genome each generation then,
1 Million generations => 8mb per genome, 8gb total for 1k population
1 Billion generations => 8gb per genome, 8tb total for 1k population
Genome size grows linearly with number of generations elapsed!
@MorenoMatthewA
Solution: pruning
@MorenoMatthewA
Solution: pruning
⬇️ pruning ⬇️
@MorenoMatthewA
Solution: pruning
Tradeoff: space vs MRCA estimate uncertainty
⬇️ pruning ⬇️
Uncertainty when estimating MRCA
@MorenoMatthewA
Solution: pruning
Tradeoff: space vs MRCA estimate uncertainty
⬇️ pruning ⬇️
Uncertainty when estimating MRCA
the subtle & interesting part! ⏩
@MorenoMatthewA
Pruning: distribution
more recent
more ancient
⬆️ evenly pruned vs. recency-proportional pruned ⬇️
more retention
less retention
@MorenoMatthewA
Results ⏩
@MorenoMatthewA
Results
Conclusion
Takeaways:
Future Work:
@MorenoMatthewA
Images
https://bevouliin.com/stork-delivery/
https://commons.wikimedia.org/wiki/File:SANUs_Phylogeny.jpg
http://clipart-library.com/clipart/gene-cliparts_8.htm
@MorenoMatthewA
Bibliography
O'Neill, Bill. "Digital evolution." PLoS Biology 1.1 (2003): e18.
Dolson, Emily, and Charles Ofria. "Spatial resource heterogeneity creates local hotspots of evolutionary potential." ECAL 2017, the Fourteenth European Conference on Artificial Life. MIT Press, 2017.
Dolson, Emily, and Charles Ofria. "Ecological theory provides insights about evolutionary computation." Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2018.
Hagstrom, George I., et al. "Using Avida to test the effects of natural selection on phylogenetic reconstruction methods." Artificial life 10.2 (2004): 157-166.
Lenski, R. E., Ofria, C., Pennock, R. T., & Adami, C. (2003). The evolutionary origin of complex features. Nature, 423(6936), 139–144.
Kauffman, Stuart, and Simon Levin. "Towards a general theory of adaptive walks on rugged landscapes." Journal of theoretical Biology 128.1 (1987): 11-45.
@MorenoMatthewA
Acknowledgement
@MorenoMatthewA @emilyldolson @charlesofria
Matthew
Emily
Charles
Materials are Available!
@MorenoMatthewA
Different Pruning Options
evenly spaced vs recency-poroportionsl
how much to prune
how to do it so you have good data no matter where you pull the stop simulation
https://cerebras.net/blog/wafer-scale-processors-the-time-has-come/
Scaling Up Digital Evolution Compute
Why?
vs.
@MorenoMatthewA
Pruning: intensity
⬆️ lightly pruned vs. heavily pruned ⬇️
@MorenoMatthewA
Results
validate pairwise comparison
interactive viz https://hopth.ru/bh
true MRCA
MRCA estimate bounds
genome 1
genome 2