Pangenome graphs built from raw sets of alignments may have complex structures which can introduce difficulty in downstream analyses, visualization, mapping, and interpretation. Graph sorting aims to find the best node order for a 1D and 2D layout to simplify these complex regions. Pangenome graphs embed linear pangenomic sequences as paths in the graph, but to our knowledge, no algorithm takes into account this biological information in the sorting. Moreover, existing 2D layout methods struggle to deal with large graphs. We present a new layout algorithm to simplify a pangenome graph, by using path-guided stochastic gradient descent (SGD3) to move a single pair of nodes at a time. We exemplify how the 1D path-guided SGD implementation is a key step in general pangenome analyses such as pangenome graph linearization and simplification.
Unsorted graph in 1D
PATH-GUIDED STOCHASTIC GRADIENT DESCENT�
Our algorithm moves a single pair of nodes at a time, optimizing the disparity between the layout distance of a node pair and the actual nucleotide distance of a path traversing these nodes.
Graph Layout by Path-Guided Stochastic Gradient Descent
GRAPH VISUALIZATIONS EXPLAINED
Simon Heumos1*, Andrea Guarracino2*, and Erik Garrison3,4
VARIATION GRAPHS ENCODE PANGENOMES
A pangenome1 models the full set of genomic elements in a given species or clade. It can efficiently be encoded2 in the form of a variation graph, which embeds the linear sequences of the pangenome as paths in the graphs themselves.
�https://bit.ly/PangenomeGraph�https://bit.ly/OptimizedDynamicGraphImplementation
FUTURE WORK
Intermediate snapshots in 1D
1Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany, 2University of Rome Tor Vergata, Via della Ricerca Scientifica 1, Rome, Italy, 3Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA, 4Biomolecular Engineering and Bioinformatics, University of California Santa Cruz, Santa Cruz, CA, USA
*Contributed equally.
References
[2] Jeizenga et al.
Acknowledgements
We thank Vincenza Colonna for organizing the Crusco Summer Hackathon and the Forentum Ritrovato museum for hosting it. �We thank the deNBI cloud for providing computational resources. �S.H. acknowledges funding from the Central Innovation Programme (ZIM) for SMEs of the Federal Ministry for Economic Affairs and Energy of Germany.
Unsorted graph in 2D
Sorted graph in 2D
Intermediate snapshots in 2D
[4] Zheng et al.
Sorted graph in 1D
Questions
Questions
Questions