1 of 80

Discrete Topological methods for visual-analytics of multivariate data

Sara Scaramuccia

Dip. Matematica – Università di Roma Tor Vergata, Italy Sep 8^th 2023

Joint work with Claudia Landi, Federico Iuricich, Leila De Floriani

2 of 80

Multiple scalar fields

Hurricane Isabel: pressure field

Hurricane Isabel: temperature field

3 of 80

Overview

topology for Data Analysis

qualitative, robust, scale-free
applications in dimensionality reduction, clustering (including learning methods), visual-analytics

univariate data

connected components on levels -> Reeb graph
higher-dimensional loops on sublevels -> PH
discrete gradient fields, persistent homology, relative perfectness
applications to terrain segmentation

multivariate data

discrete gradient fields, multiparameter persistent homology, relative perfectness
capturing opponent scalar field behaviors

Pareto

smooth case
discrete case
preliminary results

results

computing: algorithm with improving timings
visualizing hurricane / engine

conclusions

theoretical issues still to be clarified: equal behavior areas...
computability feasible but requires suitable data structures for domain triangulations
applications: .....

4 of 80

Overview

topology for Data Analysis

Homology
Persistent Homology
(discrete) Morse Theory and perfectness

univariate data case

Persistent Homology and Morse Theory
Relative-perfectness

multivariate data case

Multiparameter Persistent Homology
Multiparameter relative-perfectness
Pareto points/simplices

5 of 80

Topological Data Analysis

Qualitative dimension-independent analysis of large-sized data

Robust to noise and outliers
Handling of high-dimensional data
Classifying data too big to be processed
Metric-independence
Coordinate system-independence
Summaries of data rather than single parameter choices

shaping point clouds

scalar field segmentations

abstract graphs

6 of 80

Topology

Some topological invariants

7 of 80

Homology

p=0 : number of connected components

p=1 : number of independent loops

p=2 : number of independent cavities

Torus

1 : connected component

2 : independent loops

1 : independent cavity

Counts the number of independent p-dimensional holes (p≥0)

8 of 80

Discrete domains

Geometric simplices

Abstract simplices

9 of 80

From data to shapes

Simplicial complexes

10 of 80

Homology

Simplicial homology

11 of 80

Topological invariant limitations

12 of 80

Topological invariant limitations

13 of 80

From data to shapes

Filtrations from point clouds

14 of 80

From data to shapes

Vietoris-Rips complexes

15 of 80

Persistent Homology

Persistent homology is the main tool in TDA and allows for

tracking the change in homology of the shape in an increasing object

Long-living homology classes are assumed to be more important than other ones possibly associated with noise

16 of 80

Persistent Homology

Persistence pairs

17 of 80

Persistent Homology

Persistence pairs

18 of 80

Persistent Homology

Peristence module

large, non-homogeneous, multidimensional, abstract

e.g., point clouds, images, (weighted) graphs and scalar fields, time series, etc ...

filtration of simplicial complexes

algebraic invariants associated to the persistence module

19 of 80

a descriptor for Persistent Homology

19

Barcode

Persistence diagram

barcode – persistence diagram

20 of 80

Classification via Topological Features

Estimating via Persistence Diagrams: topological classifier [Chao et al., 2018], persistence-based linkage for hierarchical clustering [De Lara, 2022]

Learning Persistence Diagrams Persistent Homology layer [Brüel-Gabrielsson et al., 2020], TopoAutoEncoder [Moor et al., 2020]

Feature Extraction via Persistence Diagram: Giotto-tda library examples

with and without topological regularization

embedding of nested high-dimensional spheres

introduction of a penalty on the robustness of the connected components in the decision boundary

a penalty on the last appearing homological classes

Features extracted from point labeled 8 in MNIST dataset

vectorizations of collections of life-spans of homology classes to handle averages and feed supervised classifications

21 of 80

Python Lab

Examples

Basic tools in TDA: giotto TDA library giotto tda https://github.com/giotto-ai/giotto-tda

Classification of point clouds through persistence entropy and random forest method

A use case: Classification of hand-written digits

applications of TDA on the DONUT database https://donut.topology.rocks/?q=author%3A%22B.+Rieck%22

22 of 80

Morse Theory – main ideas

23 of 80

Discrete Morse functions

23

30/09/2021

24 of 80

Morse Theory – glossary

24

25 of 80

Discrete Translation

25

30/09/2021

introduced in [Forman, 1998]

26 of 80

The Morse complex

27 of 80

Discrete gradients - idea

28 of 80

Main results in (discrete) Morse Theory

29 of 80

Incidence between critical cells

29

30/09/2021

30 of 80

The Morse complex is a chain complex ...

30

30/09/2021

31 of 80

The Morse complex is a chain complex ...

31

30/09/2021

32 of 80

Results in DMT

32

30/09/2021

33 of 80

Morse Perfectness

30/09/2021

34 of 80

Perfectness NOT always possible

Example: the dunce hat

its homology is trivial

BUT

any discrete gradient gives more than one critical vertex

35 of 80

Discrete gradients and Morse complexes

35

23/05/2018

A (Forman) discrete gradient 𝑉 is a collection of pairs (𝝈,𝝉) s.t.

each cell belongs to at most one (𝝈,𝝉)

A 𝑉-path is a sequence of pairs (𝝈_𝒊,𝝉_𝒊) in 𝑉

𝝈₀,𝝉_0, 𝝈₁,𝝉₁,…, 𝝈_r,𝝉_r

such that for all 𝒊 = 0,…,r-1

𝝈_𝒊≠ 𝝉_𝒊+1 and 𝝉_𝒊+1> 𝝈_𝒊

𝑉 has no cyclic 𝑉-paths

36 of 80

Discrete gradients and Morse complexes

36

23/05/2018

Cells out of 𝑉 are called critical

Incidience among critical cells is obtained by following the connecting 𝑉-paths

The so-obtained cell complex is called discrete Morse complex

37 of 80

Discrete gradients and Morse complexes

37

23/05/2018

By following the gradient paths one can obtain a smaller complex whose cells are only the critical ones

The Morse complex might be seen as a simplified shape with the same homology as the given one

38 of 80

Univariate Background

Persistent Homology and Morse Theory

Persistent Homology and Discrete Morse Theory (relative-perfectness)

segmentation of a triangulated terrain into equal start-ending critical cells of the height scalar field

39 of 80

Simplicial Homology

The most applied topological feature in TDA

40 of 80

Reeb graphs

levelset topology

Reeb graph

Mapper output

41 of 80

Persistent Homology PH

Captures the homology changes along an increasing sequence of spaces obtained by sublevel sets wrt a scalar function

41

23/05/2018

(for each k≥0)

sublevelset topology

42 of 80

The Morse complex

42

30/09/2021

43 of 80

Morse and Persistence interplay

44 of 80

Morse and Persistence interplay

computing the Morse complex is a preprocessing for PH computations

Persistence for Morse-based segmentation:

low persistence pairs are removed and the new incidences are found following the gradient

45 of 80

Univariate case: Discrete Morse Theory and PH

[Mischaikow, Nanda, 2013]

[Allili et al., 2017a]

46 of 80

Relative-perfectness for scalar functions

Function graphics

barcode

positive critical cells correspond to starting points for bars in the barcode

negative critical cells correspond to ending points for bars in the barcode

[Robins et al., 2011]

47 of 80

Multivariate Case

Multiparameter Persistent Homology

Multiparameter Relative Perfectness

Pareto critical points/simplices

discrete critical 0-/1-/2-cells of a torus w.r.t. ϕ(x,y,z) = (x,y)

48 of 80

Multiparameter Persistent Homology - MPH

Combines multiple independent increasing family of spaces by considering sublevel sets with respect to multiple functions

(for each k≥0)

motivated by multiparameter data (e.g., RGB colors) or comparison of several criteria
MPH more informative than PH?

INTERSECTION

49 of 80

MPH more informative than PH

50 of 80

MPH more informative than PH

PH is invariant under horizontal translations of graphics

51 of 80

Mutual locations of features

52 of 80

Mutual Locations of Features

53 of 80

Mutual Locations of Features

54 of 80

Compatibility for multiple parameters

𝑓 : ∑ → ℝ

𝑭 : ∑ → ℝⁿ

[Mischaikow, Nanda, 2013]

[Allili et al., 2017a]

𝑉 compatible with 𝑓 (or 𝑭) ensures that the associated Morse complex has the same

persistent homology

multiparameter persistent homology

55 of 80

Multivariate relative-perfectness

new homology classes can appear by adding no extra cell

critical cell can destroy homology classes not present in any previous step

56 of 80

Defining optimality …

…generalizing to 𝑭 : ∑ → ℝⁿ

𝑉 compatible with 𝑭 is optimal w.r.t 𝑭 if

WARNING: each critical cell is either positive or negative with respect to the relative pair (∑^u, ∪∑^v) and not to previous steps

…translating Robins et al. Def into equivalent terms

𝑉 compatible with 𝑓 is optimal w.r.t 𝑓 if each critical cell is either positive or negative.

57 of 80

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ^𝒏 (optimal up to 2-dimensional domains) [S., Iuricich, Landi, De Floriani, 2020]

𝒏=1

𝒏>1

global

local

[Robins et al, 2011]

[Allili et al, 2017a]

[Iuricich, S., Landi, De Floriani, 2016]

58 of 80

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ^𝒏

ComputeLowerStar

computes the lower star of a cell

𝐿_𝑭(𝝈)={𝝉∈ ∑ | 𝝉 > 𝝈 and 𝑭(𝝉)≼ 𝑭(𝝈)}

HomotopyExpansion

TIME COMPLEXITY O(|Star| log(|Star|))

TIME COMPLEXITY O(|Star|)

with |Star|:=worts-case|𝐿_𝑭|

59 of 80

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ^𝒏

Introduction of a well-extensible indexing 𝐼 over vertexes and extended by 𝐼(𝝈):=max_v∈𝝈𝐼(v) s.t.

𝑭(𝝈) ≼ 𝑭(𝝉) ⇒ 𝐼(𝝈) ≤ 𝐼(𝝉)

for each vertex

ComputeLowerStar w.r.t. 𝐼

SplitLowerStar into level sets

HomotopyExpansion over each level set

Original Function

Indexing

(3,0)_v

(3,1)_e

(3,3)_t (3,3)_e

(3,2)_t (3,2)_e

60 of 80

Properties

𝒏>1: discrete gradient compatible with 𝑭
local: parallelizable
memory: negligible runtime extra space

asymptotical time cost

Optimality

𝑉 is optimal w.r.t. 𝑭 if ∑ is 2-dimensional or embedded in ℝ³
also 𝑉 obtained from [Allili et al., 2017] is optimal

Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ^𝒏

61 of 80

Compatible discrete gradients and multivariate visualization

a discrete gradient compatible with two functions captures where the two functions increase in opposite directions

62 of 80

Compatible discrete gradients and multivariate visualization

A point x in a differential manifold 𝑫 is called Pareto critical, w.r.t. a vector-valued function of class C², iff the gradients of the components are l.d. through positive coefficients

the Pareto critical points of a torus w.r.t. ϕ(x,y,z) = (x,y)

63 of 80

Compatible discrete gradients and multivariate visualization

discrete critical 0-/1-/2-cells of a torus w.r.t. ϕ(x,y,z) = (x,y)

64 of 80

Compatible discrete gradients and multivariate visualization

Triangulation problem detected on the sphere case

65 of 80

Simplifying multivariate data through critical clusters

Extrema clusters of cardinality at least 10

Extrema clusters of cardinality at least 100

Extrema clusters of cardinality at least 400

Extrema clusters of cardinality at least 2000

extrema clusters

defined as the connected components of the intersection of maxima and minima clusters in the incidence graph

minima clusters: components formed by incident 0-/1-critical cells
maxima clusters: components formed by incident (d-1)-/d-critical cells

66 of 80

Concluding remarks

The shape of data captures relevant information and it is feasible in practice with TDA
Persistent Homology is the main tool in TDA
Persistent Homology is tightly linked to Morse Theory
The multivariate case leads to strong connections with the idea of Pareto optima

Warnings:

Warning: Topological methods requires simplicial complex data structures (e.g. GUDHI library)

67 of 80

Concluding remarks

The shape of data captures relevant information and it is feasible in practice with TDA
Persistent Homology is the main tool in TDA
Persistent Homology is tightly linked to Morse Theory
The multivariate case leads to strong connections with the idea of Pareto optima

Warnings:

Warning: Topological methods requires simplicial complex data structures (e.g. GUDHI library)

.....thanks

68 of 80

Additional slides

definitions:

simplicial complex, cell comlpex
homology, persistent homology

dunce hat
impact on MPH computations
Pareto missing
Python Lab

69 of 80

Python Lab

Examples

Basic tools in TDA: giotto TDA library giotto tda https://github.com/giotto-ai/giotto-tda

Classification of point clouds through persistence entropy and random forest method

A use case: Classification of hand-written digits

applications of TDA on the DONUT database https://donut.topology.rocks/?q=author%3A%22B.+Rieck%22

contacts

sara.scaramuccia@uniroma2.it

70 of 80

PH

Applied, from shape analysis to:

biology [MacPherson, Schweinhart, 2010]
electoral votes [Rieck and H. Leitte , 2014]
network analysis [De Silva,Ghrist, 2017]
social networks [MSS06, HMR09, KSSM13, CH13]

Introduced to outliers handling in PH [Carlsson, Zomorodian, 2007]

Applied to shape analysis

combining photometric properties [Biasotti et al., ‘PHOG’, 2013]

MPH

Ongoing research in

theoretical: proposing descriptors bounding the amount of info
computational: efficiency and scalability

Ongoing research in

applying PH as a preprocessing for machine learning and statistical data analysis

71 of 80

Discrete Topological Spaces

simplicial complexes

cubical complexes (regular or not)

72 of 80

Discrete Spaces:�regular cell complexes with intersection property

Definition

A cell complex ∑:

is regular if all attaching maps are homeomorphisms
has the intersection property if any two cells have either empty intersection or they intersect in a single face

simplicial and cubical complexes satisfy the definition

73 of 80

Descriptors for MPH

Persistence module

stable, complete and not discrete

Persistence space

stable, not complete and not discrete

is the collection of all barcodes B(L) relative to a single direction L

74 of 80

Computing MPH

Persistence module retrieval

high computational costs, complete information mainly relevant for theoretical purposes

Groebner basis [Carlsson-Zomorodian,2005]
TopCat [Gafvert, 2016] open-source at

https://github.com/olivergafvert/topcat

Persistence space retrieval

via foliation method [Biasotti et al., 2008], incomplete information, already applied to shape retrieval [Biasotti et al., ‘PHOG’, 2013]

Approximated persistence space [Cerri-Frosini, 2010], 2D optimisation [Biasotti et al., 2011]
RIVET 2D-visualisation tool [Lesnick-Wright, 2015], now available at http://rivet.online

75 of 80

MPH preprocessing

Morse-based to reduce the input size

for 0-homology [Cerri et al., 2006]
for any homology degree [Allili et al., 2017a], [Allili et al., 2017b]

Issues in Morse-based preprocessing for MPH:

limitations for real-sized input datasets
no impact evaluation w.r.t. MPH
no notion of optimal preprocessing

76 of 80

Results: size reduction

|M|

Time G

Time M

scale factor

timings

number of fields

Iuricich, S., Landi, De Floriani: “A discrete Morse-based approach to multivariate data”, Simposium in Visualisation in SIGGRAPH Asia 2016.

number of critical cells left

time to retrieve the gradient V

time to retrieve the complex M from V

7.3x < … < 40x

x,xx minutes

up to 3 …. but independent

77 of 80

Results: comparing to related works

Timings linear over all considered datasets in the number of cells with

very different proportionality coefficients

Memory maximum peaks around 10GB against 25GB

… under submission to Computational Geometry: Theory and Algorithms

Global [Allili et al., 2017a]

Ours [Iuricich, S., Landi, De Floriani, 2016]

78 of 80

persistence module computed via Topcat Library [Gafvert, 2016]

original datasets are compared to the reduced via discrete gradient

Grades : the number of cell complexes in the filtration

Time : time needed to perform Topcat routine

… under submission to Computational Geometry: Theory and Algorithms

Persistence module evaluation

79 of 80

Persistence space evaluation�wrt number of linear restrictions

the reduction impacts the linearity coefficient w.r.t. number of restrictions
even for 4 restriction + reduced timings, reduction is better than original timings

persistence space over 4 to 100 uniformly selected linear restrictions.

original datasets are compared to reduced datasets and reduced plus reduction timings

80 of 80

Persistence space evaluation�wrt different PH computations

Time comaprison of several PH algorithm with and without Morse reduction

Computing the persistence space over 100 slices. Each color indicates results for different PH algorithms

reduced datasets:

max time sum of all slices = 2 seconds

unreduced datasets:

max time sum of all slices = 25 seconds

… under submission to Computational Geometry: Theory and Algorithms