1 of 80

Discrete Topological methods for visual-analytics of multivariate data

Sara Scaramuccia

Dip. Matematica – Università di Roma Tor Vergata, Italy Sep 8th 2023

Joint work with Claudia Landi, Federico Iuricich, Leila De Floriani

2 of 80

Multiple scalar fields

  •  

Hurricane Isabel: pressure field

Hurricane Isabel: temperature field

3 of 80

Overview

  • topology for Data Analysis
    • qualitative, robust, scale-free
    • applications in dimensionality reduction, clustering (including learning methods), visual-analytics

  • univariate data
    • connected components on levels -> Reeb graph
    • higher-dimensional loops on sublevels -> PH
    • discrete gradient fields, persistent homology, relative perfectness
    • applications to terrain segmentation
  • multivariate data
    • discrete gradient fields, multiparameter persistent homology, relative perfectness
    • capturing opponent scalar field behaviors
  • Pareto
    • smooth case
    • discrete case
    • preliminary results
  • results
    • computing: algorithm with improving timings
    • visualizing hurricane / engine
  • conclusions
    • theoretical issues still to be clarified: equal behavior areas...
    • computability feasible but requires suitable data structures for domain triangulations
    • applications: .....

4 of 80

Overview

  • topology for Data Analysis
    • Homology
    • Persistent Homology
    • (discrete) Morse Theory and perfectness
  • univariate data case
    • Persistent Homology and Morse Theory
    • Relative-perfectness
  • multivariate data case
    • Multiparameter Persistent Homology
    • Multiparameter relative-perfectness
    • Pareto points/simplices

5 of 80

Topological Data Analysis

Qualitative dimension-independent analysis of large-sized data

    • Robust to noise and outliers
    • Handling of high-dimensional data
    • Classifying data too big to be processed
    • Metric-independence
    • Coordinate system-independence
    • Summaries of data rather than single parameter choices

shaping point clouds

scalar field segmentations

abstract graphs

6 of 80

Topology

Some topological invariants

7 of 80

Homology

p=0 : number of connected components

p=1 : number of independent loops

p=2 : number of independent cavities

Torus

1 : connected component

2 : independent loops

1 : independent cavity

Counts the number of independent p-dimensional holes (p≥0)

8 of 80

Discrete domains

Geometric simplices

Abstract simplices

9 of 80

From data to shapes

Simplicial complexes

 

 

10 of 80

Homology

Simplicial homology

11 of 80

Topological invariant limitations

12 of 80

Topological invariant limitations

13 of 80

From data to shapes

Filtrations from point clouds

14 of 80

From data to shapes

Vietoris-Rips complexes

15 of 80

Persistent Homology

Persistent homology is the main tool in TDA and allows for

tracking the change in homology of the shape in an increasing object

Long-living homology classes are assumed to be more important than other ones possibly associated with noise

16 of 80

Persistent Homology

Persistence pairs

17 of 80

Persistent Homology

Persistence pairs

18 of 80

Persistent Homology

Peristence module

large, non-homogeneous, multidimensional, abstract

  • e.g., point clouds, images, (weighted) graphs and scalar fields, time series, etc ...

filtration of simplicial complexes

algebraic invariants associated to the persistence module

19 of 80

a descriptor for Persistent Homology

19

Barcode

Persistence diagram

barcode – persistence diagram

20 of 80

Classification via Topological Features

Estimating via Persistence Diagrams: topological classifier [Chao et al., 2018], persistence-based linkage for hierarchical clustering [De Lara, 2022]

Learning Persistence Diagrams Persistent Homology layer [Brüel-Gabrielsson et al., 2020], TopoAutoEncoder [Moor et al., 2020]

Feature Extraction via Persistence Diagram: Giotto-tda library examples

with and without topological regularization

embedding of nested high-dimensional spheres

introduction of a penalty on the robustness of the connected components in the decision boundary

a penalty on the last appearing homological classes

Features extracted from point labeled 8 in MNIST dataset

vectorizations of collections of life-spans of homology classes to handle averages and feed supervised classifications

21 of 80

Python Lab

Examples

Basic tools in TDA: giotto TDA library giotto tda https://github.com/giotto-ai/giotto-tda

Classification of point clouds through persistence entropy and random forest method

A use case: Classification of hand-written digits

applications of TDA on the DONUT database https://donut.topology.rocks/?q=author%3A%22B.+Rieck%22

22 of 80

Morse Theory – main ideas

  •  

 

23 of 80

Discrete Morse functions

23

30/09/2021

 

24 of 80

Morse Theory – glossary

24

 

 

 

 

 

25 of 80

Discrete Translation

25

30/09/2021

 

 

 

 

 

introduced in [Forman, 1998]

26 of 80

The Morse complex

  •  

27 of 80

Discrete gradients - idea

 

 

 

28 of 80

Main results in (discrete) Morse Theory

 

 

 

 

 

 

29 of 80

Incidence between critical cells

29

30/09/2021

 

 

 

 

 

 

 

30 of 80

The Morse complex is a chain complex ...

30

30/09/2021

 

 

 

31 of 80

The Morse complex is a chain complex ...

31

30/09/2021

 

 

 

32 of 80

Results in DMT

32

30/09/2021

 

 

 

 

 

 

33 of 80

Morse Perfectness

30/09/2021

 

 

34 of 80

Perfectness NOT always possible

Example: the dunce hat

its homology is trivial

BUT

any discrete gradient gives more than one critical vertex

35 of 80

Discrete gradients and Morse complexes

35

23/05/2018

A (Forman) discrete gradient 𝑉 is a collection of pairs (𝝈,𝝉) s.t.

  • each cell belongs to at most one (𝝈,𝝉)

A 𝑉-path is a sequence of pairs (𝝈𝒊,𝝉𝒊) in 𝑉

𝝈0,𝝉0, 𝝈1,𝝉1,…, 𝝈r,𝝉r

such that for all 𝒊 = 0,…,r-1

𝝈𝒊≠ 𝝉𝒊+1 and 𝝉𝒊+1> 𝝈𝒊

  • 𝑉 has no cyclic 𝑉-paths

36 of 80

Discrete gradients and Morse complexes

36

23/05/2018

Cells out of 𝑉 are called critical

Incidience among critical cells is obtained by following the connecting 𝑉-paths

The so-obtained cell complex is called discrete Morse complex

37 of 80

Discrete gradients and Morse complexes

37

23/05/2018

By following the gradient paths one can obtain a smaller complex whose cells are only the critical ones

The Morse complex might be seen as a simplified shape with the same homology as the given one

38 of 80

Univariate Background

  • Persistent Homology and Morse Theory

  • Persistent Homology and Discrete Morse Theory (relative-perfectness)

segmentation of a triangulated terrain into equal start-ending critical cells of the height scalar field

39 of 80

Simplicial Homology

The most applied topological feature in TDA

40 of 80

Reeb graphs

levelset topology

 

Reeb graph

Mapper output

41 of 80

Persistent Homology PH

Captures the homology changes along an increasing sequence of spaces obtained by sublevel sets wrt a scalar function

41

23/05/2018

(for each k≥0)

sublevelset topology

42 of 80

The Morse complex

  •  

42

30/09/2021

43 of 80

Morse and Persistence interplay

 

 

 

 

44 of 80

Morse and Persistence interplay

 

computing the Morse complex is a preprocessing for PH computations

Persistence for Morse-based segmentation:

low persistence pairs are removed and the new incidences are found following the gradient

45 of 80

Univariate case: Discrete Morse Theory and PH

 

 

 

 

 

 

[Mischaikow, Nanda, 2013]

[Allili et al., 2017a]

 

46 of 80

Relative-perfectness for scalar functions

 

Function graphics

barcode

 

positive critical cells correspond to starting points for bars in the barcode

negative critical cells correspond to ending points for bars in the barcode

[Robins et al., 2011]

 

47 of 80

Multivariate Case

  • Multiparameter Persistent Homology

  • Multiparameter Relative Perfectness

  • Pareto critical points/simplices

discrete critical 0-/1-/2-cells of a torus w.r.t. ϕ(x,y,z) = (x,y)

48 of 80

Multiparameter Persistent Homology - MPH

Combines multiple independent increasing family of spaces by considering sublevel sets with respect to multiple functions

(for each k≥0)

  • motivated by multiparameter data (e.g., RGB colors) or comparison of several criteria
  • MPH more informative than PH?

INTERSECTION

49 of 80

MPH more informative than PH

50 of 80

MPH more informative than PH

PH is invariant under horizontal translations of graphics

51 of 80

Mutual locations of features

52 of 80

Mutual Locations of Features

53 of 80

Mutual Locations of Features

54 of 80

Compatibility for multiple parameters

𝑓 : ∑ → ℝ

𝑭 : ∑ → ℝn

[Mischaikow, Nanda, 2013]

[Allili et al., 2017a]

𝑉 compatible with 𝑓 (or 𝑭) ensures that the associated Morse complex has the same

persistent homology

multiparameter persistent homology

55 of 80

Multivariate relative-perfectness

new homology classes can appear by adding no extra cell

critical cell can destroy homology classes not present in any previous step

56 of 80

Defining optimality …

…generalizing to 𝑭 : ∑ → ℝn

𝑉 compatible with 𝑭 is optimal w.r.t 𝑭 if

WARNING: each critical cell is either positive or negative with respect to the relative pair (∑u, ∪∑v) and not to previous steps

…translating Robins et al. Def into equivalent terms

𝑉 compatible with 𝑓 is optimal w.r.t 𝑓 if each critical cell is either positive or negative.

57 of 80

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏 (optimal up to 2-dimensional domains) [S., Iuricich, Landi, De Floriani, 2020]

𝒏=1

𝒏>1

global

local

[Robins et al, 2011]

[Allili et al, 2017a]

[Iuricich, S., Landi, De Floriani, 2016]

58 of 80

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏

ComputeLowerStar

computes the lower star of a cell

𝐿𝑭 (𝝈)={𝝉∈ ∑ | 𝝉 > 𝝈 and 𝑭(𝝉)≼ 𝑭(𝝈)}

HomotopyExpansion

 

TIME COMPLEXITY O(|Star| log(|Star|))

TIME COMPLEXITY O(|Star|)

with |Star|:=worts-case|𝐿𝑭 |

59 of 80

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏

Introduction of a well-extensible indexing 𝐼 over vertexes and extended by 𝐼(𝝈):=maxv∈𝝈 𝐼(v) s.t.

𝑭(𝝈) ≼ 𝑭(𝝉) ⇒ 𝐼(𝝈)𝐼(𝝉)

for each vertex

ComputeLowerStar w.r.t. 𝐼

SplitLowerStar into level sets

HomotopyExpansion over each level set

Original Function

Indexing

(3,0)v

(3,1)e

(3,3)t (3,3)e

(3,2)t (3,2)e

60 of 80

Properties

    • 𝒏>1: discrete gradient compatible with 𝑭
    • local: parallelizable
    • memory: negligible runtime extra space

asymptotical time cost

Optimality

    • 𝑉 is optimal w.r.t. 𝑭 if ∑ is 2-dimensional or embedded in ℝ3
    • also 𝑉 obtained from [Allili et al., 2017] is optimal

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏

61 of 80

Compatible discrete gradients and multivariate visualization

a discrete gradient compatible with two functions captures where the two functions increase in opposite directions

62 of 80

Compatible discrete gradients and multivariate visualization

A point x in a differential manifold 𝑫 is called Pareto critical, w.r.t. a vector-valued function of class C2, iff the gradients of the components are l.d. through positive coefficients

the Pareto critical points of a torus w.r.t. ϕ(x,y,z) = (x,y)

63 of 80

Compatible discrete gradients and multivariate visualization

discrete critical 0-/1-/2-cells of a torus w.r.t. ϕ(x,y,z) = (x,y)

64 of 80

Compatible discrete gradients and multivariate visualization

Triangulation problem detected on the sphere case

65 of 80

Simplifying multivariate data through critical clusters

Extrema clusters of cardinality at least 10

Extrema clusters of cardinality at least 100

Extrema clusters of cardinality at least 400

Extrema clusters of cardinality at least 2000

extrema clusters

defined as the connected components of the intersection of maxima and minima clusters in the incidence graph

  • minima clusters: components formed by incident 0-/1-critical cells
  • maxima clusters: components formed by incident (d-1)-/d-critical cells

66 of 80

Concluding remarks

  • The shape of data captures relevant information and it is feasible in practice with TDA
  • Persistent Homology is the main tool in TDA
  • Persistent Homology is tightly linked to Morse Theory
  • The multivariate case leads to strong connections with the idea of Pareto optima

Warnings:

  • Warning: Topological methods requires simplicial complex data structures (e.g. GUDHI library)

67 of 80

Concluding remarks

  • The shape of data captures relevant information and it is feasible in practice with TDA
  • Persistent Homology is the main tool in TDA
  • Persistent Homology is tightly linked to Morse Theory
  • The multivariate case leads to strong connections with the idea of Pareto optima

Warnings:

  • Warning: Topological methods requires simplicial complex data structures (e.g. GUDHI library)

.....thanks

68 of 80

Additional slides

  • definitions:
    • simplicial complex, cell comlpex
    • homology, persistent homology
  • dunce hat
  • impact on MPH computations
  • Pareto missing
  • Python Lab

69 of 80

Python Lab

Examples

Basic tools in TDA: giotto TDA library giotto tda https://github.com/giotto-ai/giotto-tda

Classification of point clouds through persistence entropy and random forest method

A use case: Classification of hand-written digits

applications of TDA on the DONUT database https://donut.topology.rocks/?q=author%3A%22B.+Rieck%22

70 of 80

PH

Applied, from shape analysis to:

  • biology [MacPherson, Schweinhart, 2010]
  • electoral votes [Rieck and H. Leitte , 2014]
  • network analysis [De Silva,Ghrist, 2017]
  • social networks [MSS06, HMR09, KSSM13, CH13]

Introduced to outliers handling in PH [Carlsson, Zomorodian, 2007]

Applied to shape analysis

  • combining photometric properties [Biasotti et al., ‘PHOG’, 2013]

MPH

Ongoing research in

  • theoretical: proposing descriptors bounding the amount of info
  • computational: efficiency and scalability

Ongoing research in

  • applying PH as a preprocessing for machine learning and statistical data analysis

71 of 80

Discrete Topological Spaces

simplicial complexes

cubical complexes (regular or not)

72 of 80

Discrete Spaces:�regular cell complexes with intersection property

Definition

A cell complex ∑:

  • is regular if all attaching maps are homeomorphisms
  • has the intersection property if any two cells have either empty intersection or they intersect in a single face

simplicial and cubical complexes satisfy the definition

73 of 80

Descriptors for MPH

  • Persistence module

stable, complete and not discrete

  • Persistence space

stable, not complete and not discrete

is the collection of all barcodes B(L) relative to a single direction  L

74 of 80

Computing MPH

Persistence module retrieval

high computational costs, complete information mainly relevant for theoretical purposes

  • Groebner basis [Carlsson-Zomorodian,2005] 
  • TopCat [Gafvert, 2016] open-source at

https://github.com/olivergafvert/topcat

Persistence space retrieval

via foliation method [Biasotti et al., 2008], incomplete information, already applied to shape retrieval [Biasotti et al., ‘PHOG’, 2013]

  • Approximated persistence space [Cerri-Frosini, 2010], 2D optimisation [Biasotti et al., 2011]
  • RIVET 2D-visualisation tool [Lesnick-Wright, 2015], now available at http://rivet.online

75 of 80

MPH preprocessing

Morse-based to reduce the input size

  • for 0-homology [Cerri et al., 2006]
  • for any homology degree [Allili et al., 2017a], [Allili et al., 2017b]

Issues in Morse-based preprocessing for MPH:

  • limitations for real-sized input datasets
  • no impact evaluation w.r.t. MPH
  • no notion of optimal preprocessing

76 of 80

Results: size reduction

|M|

Time G

Time M

scale factor

timings

number of fields

Iuricich, S., Landi, De Floriani: “A discrete Morse-based approach to multivariate data”, Simposium in Visualisation in SIGGRAPH Asia 2016.

number of critical cells left

time to retrieve the gradient V

time to retrieve the complex M from V

7.3x < … < 40x

x,xx minutes

up to 3 …. but independent

77 of 80

Results: comparing to related works

Timings linear over all considered datasets in the number of cells with

very different proportionality coefficients

Memory maximum peaks around 10GB against 25GB

… under submission to Computational Geometry: Theory and Algorithms

… under submission to Computational Geometry: Theory and Algorithms

Global [Allili et al., 2017a]

Ours [Iuricich, S., Landi, De Floriani, 2016]

78 of 80

persistence module computed via Topcat Library [Gafvert, 2016]

original datasets are compared to the reduced via discrete gradient

Grades : the number of cell complexes in the filtration

Time : time needed to perform Topcat routine

… under submission to Computational Geometry: Theory and Algorithms

Persistence module evaluation

79 of 80

Persistence space evaluation�wrt number of linear restrictions

  • the reduction impacts the linearity coefficient w.r.t. number of restrictions
  • even for 4 restriction + reduced timings, reduction is better than original timings

persistence space over 4 to 100 uniformly selected linear restrictions.

original datasets are compared to reduced datasets and reduced plus reduction timings

80 of 80

Persistence space evaluation�wrt different PH computations

Time comaprison of several PH algorithm with and without Morse reduction

Computing the persistence space over 100 slices. Each color indicates results for different PH algorithms

reduced datasets:

max time sum of all slices = 2 seconds

unreduced datasets:

max time sum of all slices = 25 seconds

… under submission to Computational Geometry: Theory and Algorithms