1 of 93

Minimal Cycle Representatives in Persistent Homology using Linear Programming

Lori Ziegelmeier

Macalester College

AATRN Vietoris-Rips Seminar

September 30, 2022

2 of 93

Motivation

3 of 93

 

Data

(in the form of a topological space)

4 of 93

 

 

Cycle representative

(of a homology class)

5 of 93

 

 

Cycle representative

(of a different homology class)

6 of 93

 

 

 

A tighter representative

7 of 93

 

 

 

PLAN

8 of 93

Collaborators:

Paper

Minimal Cycle Representatives in Persistent Homology using Linear Programming: an Empirical Study with User’s Guide

https://arxiv.org/pdf/2105.07025.pdf

https://www.frontiersin.org/articles/10.3389/frai.2021.681117/full

Code

https://github.com/ TDAMinimalGeneratorResearch/minimal-generator

https://github.com/UDATG/optimal_reps

Related papers

I. Obayashi. "Volume-optimal cycle: Tightest representative cycle of a generator in persistent homology." SIAM Journal on Applied Algebra and Geometry 2.4 (2018): 508-534.

E. Escolar and Y. Hiraoka. "Optimal cycles for persistent homology via linear programming." Optimization in the Real World. Springer, Tokyo, 2016. 79-96.

H. Hang, C. Giusti, L. Ziegelmeier, and G. Henselman-Petrusek. “U-match factorization: sparse homological algebra, lazy cycle representatives, and dualities in persistent (co)homology.” 2021, https://arxiv.org/pdf/2108.08831.pdf

C. Giusti, C. Henselman-Petrusek, and L. Ziegelmeier. “The basis matching complex: a streamlined framework for persistent homological algebra,” forthcoming.

ExHACT PIs

Lu Li Connor Thompson Greg Henselman-Petrusek Chad Giusti

9 of 93

Background

10 of 93

motivation: finding loops in a topological space

What is homology?

 

Boundary

matrix

DX

Topological space

X

Since

And Homology

Zk := k-cycles

Bk := k-boundaries

choice of field

 

11 of 93

motivation: mimic the mind’s eye with a union of balls

what you see

what you tell your computer

What is persistent homology?

what you construe

12 of 93

radius = 0

radius = 0.5

radius = 1

What is persistent homology?

problem: topology depends on choice of radius

13 of 93

 

radius

 

 

 

nested

sequence

of spaces

 

14 of 93

 

 

 

 

 

idealized

model

 

 

 

 

15 of 93

 

 

 

 

 

 

idealized

model

 

 

“lifespan" of

 

 

 

 

 

 

 

 

 

 

 

cycle

appears

cycle

vanishes ( 0 = [A]∈ H1(X) )

1-boundaries

1-cycles

16 of 93

 

 

 

 

 

idealized

model

 

 

 

 

barcode

 

lifespan of

lifespan of

 

 

 

 

17 of 93

 

radius

 

barcode

 

lifespan of

lifespan of

nested

sequence

of spaces

more “robust”

 

 

 

 

18 of 93

motivation: matrix factorization

How to compute persistence?

Standard Algorithm

Cohomology Algorithm

V, invertible &

upper triangular

R, “reduced”, every nonzero column has lowest entry in distinct row

Aside: ExHACT allows users to compute each of these (and several other related) matrices

Check out:

H. Hang, C. Giusti, L. Ziegelmeier, and G. Henselman-Petrusek. “U-match factorization: sparse homological algebra,�lazy cycle representatives, and dualities in persistent (co)homology.” 2021, https://arxiv.org/pdf/2108.08831.pdf

19 of 93

Another Aside: Basis Matching Complex

C. Giusti, C. Henselman-Petrusek, and L. Ziegelmeier. “The basis matching complex: a streamlined framework for persistent homological algebra,” forthcoming.

20 of 93

Cycle Representatives

21 of 93

When do “good” cycle representatives matter?

 

lifespan of B

lifespan of A

 

  • you need to locate a feature in some data
  • you need to visualize a topological feature
  • you prefer smooth curves
  • geometry matters

Would you prefer A or B?

22 of 93

When do “good” cycle representatives matter?

3d reconstruction of cardiac tissue

Righthand column shows application of “tight” cycle representatives to counterbalance automated smoothing priors; blue signifies higher reconstruction error

Source: Wu, Pengxiang, et al. "Optimal topological cycles and their application in cardiac trabeculae restoration." International Conference on Information Processing in Medical Imaging. Springer, Cham, 2017.

Example:

Analyze chromatin interaction to study folding

Loops represent different types of chromatin interactions (pictured). Annotate particular loops by localizing a cycle.

Example:

Source: Emmett, Kevin, et al. “Multiscale Topology of Chromatin Folding." Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies. 2016.

23 of 93

What makes a cycle representative “good”?

simple closed curves - where possible, it’s convenient to represent a feature as a simple closed curve (connections to graph theory)

 

 

tightness - tight representatives help to localize holes in data

 

 

24 of 93

What makes a cycle representative “good”?

25 of 93

Example Filtration

26 of 93

A

B

D

E

F

G

C

AC

BC

A B C D E F G

27 of 93

A

B

D

E

F

G

C

A B C D E F G

AC

BC

28 of 93

A

B

D

E

F

G

C

AB

EF

FG

AC

BC

A B C D E F G

 

 

29 of 93

A

B

D

E

F

G

C

ABC

AC

BC

AB

EF

FG

A B C D E F G

 

 

30 of 93

A

B

D

E

F

G

C

AG

BD

ABC

AC

BC

AB

EF

FG

A B C D E F G

31 of 93

A

B

D

E

F

G

C

DE

ABC

AG

BD

AC

BC

AB

EF

FG

A B C D E F G

32 of 93

 

A

B

D

E

F

G

C

ABC

AG

BD

AC

BC

AB

EF

FG

DE

A B C D E F G

33 of 93

 

A

B

D

E

F

G

C

ABC

AG

BD

AC

BC

AB

EF

FG

DE

A B C D E F G

34 of 93

 

A

B

D

E

F

G

C

CD

ABC

AG

BD

AC

BC

AB

EF

FG

DE

A B C D E F G

35 of 93

A

B

D

E

F

G

C

BCD

ABC

AG

BD

AC

BC

AB

EF

FG

DE

CD

A B C D E F G

36 of 93

A

B

D

E

F

G

C

GC

ABC

AG

BD

BCD

AC

BC

AB

EF

FG

DE

CD

A B C D E F G

37 of 93

A

B

D

E

F

G

C

ACG

A B C D E F G

ABC

AG

BD

BCD

GC

AC

BC

AB

EF

FG

DE

CD

38 of 93

A

B

D

E

F

G

C

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

AC

BC

AB

EF

FG

DE

CD

CE

39 of 93

A

B

D

E

F

G

C

CED

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

CE

AC

BC

AB

EF

FG

DE

CD

40 of 93

A

B

D

E

F

G

C

CF

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

CE

CED

AC

BC

AB

EF

FG

DE

CD

41 of 93

A

B

D

E

F

G

C

CEF

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

CE

CED

CF

AC

BC

AB

EF

FG

DE

CD

CFG

42 of 93

A

B

D

E

F

G

C

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

CE

CED

CF

CEF

CFG

AC

BC

AB

EF

FG

DE

CD

43 of 93

A

B

D

E

F

G

C

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

CE

CED

CF

CEF

CFG

AC

BC

AB

EF

FG

DE

CD

44 of 93

A B C D E F G

ABC

AG

BD

BCD

GC

ACG

CE

CED

CF

CEF

CFG

AC

BC

AB

EF

FG

DE

CD

 

dim = 0

dim = 1

dim = 2

Simplex-wise Filtration

45 of 93

Minimal Cycle Representatives

46 of 93

Example

POINT CLOUD

FILTRATION

BARCODE

REPRESENTATIVE

FOR A SINGLE BAR

 

 

47 of 93

Minimizing a Persistent Cycle Representative

  •  

Related papers

E. Escolar and Y. Hiraoka. "Optimal cycles for persistent homology via linear programming." Optimization in the Real World. Springer, Tokyo, 2016. 79-96.

I. Obayashi. "Volume-optimal cycle: Tightest representative cycle of a generator in persistent homology." SIAM Journal on Applied Algebra and Geometry 2.4 (2018): 508-534.

?

 

dimension-1

48 of 93

D

A

B

E

F

G

C

D

A

B

E

F

G

C

 

AG

BD

DE

CD

GC

CE

AC

BC

AB

EF

FG

1

-1

0

1

1

1

1

1

0

0

0

CF

 

 

 

 

 

-1

1

1

0

0

0

0

0

0

0

0

 

+

Minimize Number of Edges

49 of 93

Minimize Number of Edges

D

A

B

E

F

G

C

D

A

B

E

F

G

C

 

AG

BD

DE

CD

GC

CE

AC

BC

AB

EF

FG

1

-1

0

1

1

1

1

1

0

0

0

CF

 

 

 

 

 

-1

1

1

0

0

0

0

0

0

0

0

 

+

0

0

1

1

1

1

1

1

0

0

0

=

 

50 of 93

Minimize Number of Edges

D

A

B

E

F

G

C

D

A

B

E

F

G

C

 

AG

BD

DE

CD

GC

CE

AC

BC

AB

EF

FG

1

-1

0

1

1

1

1

1

0

0

0

CF

 

 

 

 

 

-1

1

1

0

0

0

0

0

0

0

0

 

+

0

0

1

1

1

1

1

1

0

0

0

=

 

51 of 93

D

A

B

F

G

C

D

A

B

E

F

G

C

D

A

B

E

F

G

C

 

 

Weighted Length of Edges

52 of 93

D

A

B

E

F

G

C

D

A

B

E

F

G

C

 

 

53 of 93

 

Number of Triangles (a cycle bounds)

54 of 93

ABC

AG

BD

DE

CD

BCD

GC

CE

CED

CF

CEF

CFG

AC

BC

AB

EF

FG

ACG

Number of Triangles (a cycle bounds)

 

55 of 93

ABC

AG

BD

DE

CD

BCD

GC

CE

CED

CF

CEF

AC

BC

AB

EF

FG

ACG

F

G

C

Number of Triangles (a cycle bounds)

 

56 of 93

ABC

AG

BD

DE

CD

BCD

GC

CE

CED

CF

AC

BC

AB

EF

FG

ACG

E

F

G

C

Number of Triangles (a cycle bounds)

 

57 of 93

ABC

AG

BD

DE

CD

BCD

GC

CE

CED

CF

AC

BC

AB

EF

FG

A

E

F

G

C

Number of Triangles (a cycle bounds)

 

58 of 93

ABC

AG

BD

DE

CD

BCD

GC

CE

CF

AC

BC

AB

EF

FG

A

D

E

F

G

C

Number of Triangles (a cycle bounds)

 

59 of 93

A

B

D

E

F

G

C

ABC

AG

BD

DE

CD

GC

CE

CF

AC

BC

AB

EF

FG

Number of Triangles (a cycle bounds)

 

60 of 93

A

B

D

E

F

G

C

AG

BD

DE

CD

GC

CE

CF

CEF

AC

BC

AB

EF

FG

ABC

BCD

CED

ACG

CFG

Number of Triangles (a cycle bounds)

 

61 of 93

A

B

D

E

F

G

C

AG

BD

DE

CD

GC

CE

CF

CEF

AC

BC

AB

EF

FG

ABC

BCD

CED

ACG

CFG

 

Number of Triangles (a cycle bounds)

 

62 of 93

A

B

D

E

F

G

C

A

B

D

E

F

G

C

6 Triangles

5 Triangles

Edge-loss

Triangle-loss

63 of 93

A

B

D

E

F

G

C

AG

BD

DE

CD

GC

CE

CF

CEF

AC

BC

AB

EF

FG

ABC

BCD

CED

ACG

CFG

 

Weighted Area of Triangles (a cycle bounds)

64 of 93

A Note About Linear Programming

Linear Programs are of the form:

Mixed Integer Programs are of the form:

Fewer constraints,

Convex optimization

More constraints,

Discrete problem

 

 

 

65 of 93

Summary

 

POINT CLOUD

FILTRATION

BARCODE

REPRESENTATIVE

FOR A SINGLE BAR

 

EIGHT OPTIMIZATION PROBLEMS

 

 

 

 

 

66 of 93

# edges

sum of edge

lengths

sum of (bounding) triangle areas

# bounding

triangles

 

Example

 

 

 

Solutions to Unweighted Problems Often Non-Unique

Unif loss functions use actual 0-norm

67 of 93

Optimized Persistent Homology Cycle Basis

 

 

 

 

 

 

 

 

 

 

 

 

68 of 93

Empirical Analysis of �Cycle Representatives

69 of 93

What happens when we optimize?

  • computational cost
  • value added, as measured by
    • tightness
    • bounding volume
  • structure of the input/output
    • simple closed curves
    • coefficients (does the cycle rep take

values in {-1, 0, 1}?)

70 of 93

DATA SETS

“Real world”

  • Vicsek biological aggregation model; Fractal networks; C. Elegans; Genomic sequences of the HIV virus; Genomic sequences of of H3N2; Human genome; US Congress roll-call voting; Network of network scientists; Klein bottle; Stanford dragon
  • Edge weights Euclidean distance, Hamming distance, linear weight-degree correlations, inverse edge weights, etc.

Random point cloud

  • Normal, exponential, gamma, logistic distributions on Euclidean n-space for n=2,…,10
  • Edge weights Euclidean distance

Erdös-Rényi

  • No corresponding point cloud (may not satisfy the triangle inequality)
  • Edge weights Uniform random

71 of 93

Computation

  • A novel solver for persistence with Q coefficients

  • Formatting of inputs to linear programs

  • Wrappers for linear solvers (Gurobi and GLPK)

  • Available at:

https://github.com/TDAMinimalGeneratorResearch/minimal-generator

72 of 93

Computation time

73 of 93

Timing

 

Real-World

Synthetic

Take-away

  • Usually (5 of 11 edge-loss real world, 7 of 11 triangle, and all synthetic),

74 of 93

Dimension of ambient Euclidean space

Take-away

Solver matters!!!

75 of 93

Timing

 

Real-World

Synthetic

Take-away

  • Usually (5 of 11 edge-loss real world, 7 of 11 triangle, and all synthetic),

 

 

  • Usually (60.11% of representatives),
  • Always,

76 of 93

 

 

 

 

box plots

red line represents

ratio of 1

scatter plots

77 of 93

Average size

What’s normal?

78 of 93

Dimension of ambient Euclidean space

Most cycles start small

(So there’s little room for improvement)

Most random Euclidean point clouds have small PH cycle representatives.

79 of 93

Effectiveness of optimization

 

 

high compression ratio

low compression ratio

80 of 93

Effectiveness of optimization

 

 

 

 

compression ratio ~

81 of 93

 

 

 

 

 

 

 

 

 

 

 

 

 

Randomly generated point clouds

ceiling effect

compression

ratio

82 of 93

 

 

 

“Real world” data sets

83 of 93

 

 

 

“Real world” data sets

84 of 93

Length versus bounded area

Minimizing length does not necessarily minimize area — but it usually does

85 of 93

 

 

 

Take-away

  • Can replace uniform-weighted minimal cycles with length-weighted minimal cycles
    • 63% uniform-minimal are also length minimal
    • 99% length minimal are also uniform-minimal
  • Area-optimal cycle representatives may not enclose the smallest bounding area (last row)
  • Area-minimal cycles tend to have near-optimal length, and length-minimal cycles tend to have near-optimal area.
  • Depending on your goal, it may work well to choose the method that runs the fastest.

cycle minimization

area minimization

ratio of compression (weighted length of edges)

ratio of compression

(# edges)

ratio of compression

(bounded area)

86 of 93

Length versus area

Surprisingly, methods designed to minimize length can reduce area better than methods designed to minimize area

Original

Optimized for # of edges

Optimized for total area of bounding triangles

Source of the discrepancy

87 of 93

Counting loops

in the support of a cycle

 

 

88 of 93

 

The only random space to generate multi-loop persistent cycle representatives in large numbers

 

89 of 93

 

 

 

90 of 93

Coefficients

 

91 of 93

 

 

Findings

Advantages of …

Take-away

Can drop the integral constraint (to save on computational costs) and still achieve integral solution!

92 of 93

 

SUMMARY FOR THE PRACTITIONER

Questions for the researcher: Why?

93 of 93

Questions?

Collaborators:

Lu Li Connor Thompson Greg Henselman-Petrusek Chad Giusti

Lori Ziegelmeier

lziegel1@Macalester.edu

Department of Mathematics, Statistics, and Computer Science

Supported by NSF:CDS&E-MSS-1854703

Join WinCompTop:

Send blank email to

WinCompTop+subscribe@googlegroups.com

Paper

Minimal Cycle Representatives in Persistent Homology using Linear Programming: an Empirical Study with User’s Guide

https://www.frontiersin.org/articles/10.3389/frai.2021.681117/full

Code

https://github.com/ TDAMinimalGeneratorResearch/minimal-generator

https://github.com/UDATG/optimal_reps

Related papers

I. Obayashi. "Volume-optimal cycle: Tightest representative cycle of a generator in persistent homology." SIAM Journal on Applied Algebra and Geometry 2.4 (2018): 508-534.

E. Escolar and Y. Hiraoka. "Optimal cycles for persistent homology via linear programming." Optimization in the Real World. Springer, Tokyo, 2016. 79-96.

H. Hang, C. Giusti, L. Ziegelmeier, and G. Henselman-Petrusek. “U-match factorization: sparse homological algebra, lazy cycle representatives, and dualities in persistent (co)homology.” 2021, https://arxiv.org/pdf/2108.08831.pdf

C. Giusti, C. Henselman-Petrusek, and L. Ziegelmeier. “The basis matching complex: a streamlined framework for persistent homological algebra,” forthcoming.

Qingru Zhang