Discrete Topological methods for visual-analytics of multivariate data
Sara Scaramuccia
Dip. Matematica – Università di Roma Tor Vergata, Italy Sep 8th 2023
Joint work with Claudia Landi, Federico Iuricich, Leila De Floriani
Multiple scalar fields
Hurricane Isabel: pressure field
Hurricane Isabel: temperature field
Overview
Overview
Topological Data Analysis
Qualitative dimension-independent analysis of large-sized data
shaping point clouds
scalar field segmentations
abstract graphs
Topology
Some topological invariants
Homology
p=0 : number of connected components
p=1 : number of independent loops
p=2 : number of independent cavities
Torus
1 : connected component
2 : independent loops
1 : independent cavity
Counts the number of independent p-dimensional holes (p≥0)
Discrete domains
Geometric simplices
Abstract simplices
From data to shapes
Simplicial complexes
Homology
Simplicial homology
Topological invariant limitations
Topological invariant limitations
From data to shapes
Filtrations from point clouds
From data to shapes
Vietoris-Rips complexes
Persistent Homology
Persistent homology is the main tool in TDA and allows for
tracking the change in homology of the shape in an increasing object
Long-living homology classes are assumed to be more important than other ones possibly associated with noise
Persistent Homology
Persistence pairs
Persistent Homology
Persistence pairs
Persistent Homology
Peristence module
large, non-homogeneous, multidimensional, abstract
filtration of simplicial complexes
algebraic invariants associated to the persistence module
a descriptor for Persistent Homology
19
Barcode
Persistence diagram
barcode – persistence diagram
Classification via Topological Features
Estimating via Persistence Diagrams: topological classifier [Chao et al., 2018], persistence-based linkage for hierarchical clustering [De Lara, 2022]
Learning Persistence Diagrams Persistent Homology layer [Brüel-Gabrielsson et al., 2020], TopoAutoEncoder [Moor et al., 2020]
Feature Extraction via Persistence Diagram: Giotto-tda library examples
with and without topological regularization
embedding of nested high-dimensional spheres
introduction of a penalty on the robustness of the connected components in the decision boundary
a penalty on the last appearing homological classes
Features extracted from point labeled 8 in MNIST dataset
vectorizations of collections of life-spans of homology classes to handle averages and feed supervised classifications
Python Lab
Examples
Basic tools in TDA: giotto TDA library giotto tda https://github.com/giotto-ai/giotto-tda
Classification of point clouds through persistence entropy and random forest method
A use case: Classification of hand-written digits
applications of TDA on the DONUT database https://donut.topology.rocks/?q=author%3A%22B.+Rieck%22
Morse Theory – main ideas
Discrete Morse functions
23
30/09/2021
Morse Theory – glossary
24
Discrete Translation
25
30/09/2021
introduced in [Forman, 1998]
The Morse complex
Discrete gradients - idea
Main results in (discrete) Morse Theory
Incidence between critical cells
29
30/09/2021
The Morse complex is a chain complex ...
30
30/09/2021
The Morse complex is a chain complex ...
31
30/09/2021
Results in DMT
32
30/09/2021
Morse Perfectness
30/09/2021
Perfectness NOT always possible
Example: the dunce hat
its homology is trivial
BUT
any discrete gradient gives more than one critical vertex
Discrete gradients and Morse complexes
35
23/05/2018
A (Forman) discrete gradient 𝑉 is a collection of pairs (𝝈,𝝉) s.t.
A 𝑉-path is a sequence of pairs (𝝈𝒊,𝝉𝒊) in 𝑉
𝝈0,𝝉0, 𝝈1,𝝉1,…, 𝝈r,𝝉r
such that for all 𝒊 = 0,…,r-1
𝝈𝒊≠ 𝝉𝒊+1 and 𝝉𝒊+1> 𝝈𝒊
Discrete gradients and Morse complexes
36
23/05/2018
Cells out of 𝑉 are called critical
Incidience among critical cells is obtained by following the connecting 𝑉-paths
The so-obtained cell complex is called discrete Morse complex
Discrete gradients and Morse complexes
37
23/05/2018
By following the gradient paths one can obtain a smaller complex whose cells are only the critical ones
The Morse complex might be seen as a simplified shape with the same homology as the given one
Univariate Background
segmentation of a triangulated terrain into equal start-ending critical cells of the height scalar field
Simplicial Homology
The most applied topological feature in TDA
Reeb graphs
levelset topology
Reeb graph
Mapper output
Persistent Homology PH
Captures the homology changes along an increasing sequence of spaces obtained by sublevel sets wrt a scalar function
41
23/05/2018
(for each k≥0)
sublevelset topology
The Morse complex
42
30/09/2021
Morse and Persistence interplay
Morse and Persistence interplay
computing the Morse complex is a preprocessing for PH computations
Persistence for Morse-based segmentation:
low persistence pairs are removed and the new incidences are found following the gradient
Univariate case: Discrete Morse Theory and PH
[Mischaikow, Nanda, 2013]
[Allili et al., 2017a]
Relative-perfectness for scalar functions
Function graphics
barcode
positive critical cells correspond to starting points for bars in the barcode
negative critical cells correspond to ending points for bars in the barcode
[Robins et al., 2011]
Multivariate Case
discrete critical 0-/1-/2-cells of a torus w.r.t. ϕ(x,y,z) = (x,y)
Multiparameter Persistent Homology - MPH
Combines multiple independent increasing family of spaces by considering sublevel sets with respect to multiple functions
(for each k≥0)
INTERSECTION
MPH more informative than PH
MPH more informative than PH
PH is invariant under horizontal translations of graphics
Mutual locations of features
Mutual Locations of Features
Mutual Locations of Features
Compatibility for multiple parameters
𝑓 : ∑ → ℝ
𝑭 : ∑ → ℝn
[Mischaikow, Nanda, 2013]
[Allili et al., 2017a]
𝑉 compatible with 𝑓 (or 𝑭) ensures that the associated Morse complex has the same
persistent homology
multiparameter persistent homology
Multivariate relative-perfectness
new homology classes can appear by adding no extra cell
critical cell can destroy homology classes not present in any previous step
Defining optimality …
…generalizing to 𝑭 : ∑ → ℝn
𝑉 compatible with 𝑭 is optimal w.r.t 𝑭 if
WARNING: each critical cell is either positive or negative with respect to the relative pair (∑u, ∪∑v) and not to previous steps
…translating Robins et al. Def into equivalent terms
𝑉 compatible with 𝑓 is optimal w.r.t 𝑓 if each critical cell is either positive or negative.
Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏 (optimal up to 2-dimensional domains) [S., Iuricich, Landi, De Floriani, 2020]
𝒏=1
𝒏>1
global
local
[Robins et al, 2011]
[Allili et al, 2017a]
[Iuricich, S., Landi, De Floriani, 2016]
Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏
ComputeLowerStar
computes the lower star of a cell
𝐿𝑭 (𝝈)={𝝉∈ ∑ | 𝝉 > 𝝈 and 𝑭(𝝉)≼ 𝑭(𝝈)}
HomotopyExpansion
TIME COMPLEXITY O(|Star| log(|Star|))
TIME COMPLEXITY O(|Star|)
with |Star|:=worts-case|𝐿𝑭 |
Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏
Introduction of a well-extensible indexing 𝐼 over vertexes and extended by 𝐼(𝝈):=maxv∈𝝈 𝐼(v) s.t.
𝑭(𝝈) ≼ 𝑭(𝝉) ⇒ 𝐼(𝝈) ≤ 𝐼(𝝉)
for each vertex
ComputeLowerStar w.r.t. 𝐼
SplitLowerStar into level sets
HomotopyExpansion over each level set
Original Function
Indexing
(3,0)v
(3,1)e
(3,3)t (3,3)e
(3,2)t (3,2)e
Properties
asymptotical time cost
Optimality
Computing a discrete gradient compatible with 𝑭:𝜮⟶ℝ𝒏
Compatible discrete gradients and multivariate visualization
a discrete gradient compatible with two functions captures where the two functions increase in opposite directions
Compatible discrete gradients and multivariate visualization
A point x in a differential manifold 𝑫 is called Pareto critical, w.r.t. a vector-valued function of class C2, iff the gradients of the components are l.d. through positive coefficients
the Pareto critical points of a torus w.r.t. ϕ(x,y,z) = (x,y)
Compatible discrete gradients and multivariate visualization
discrete critical 0-/1-/2-cells of a torus w.r.t. ϕ(x,y,z) = (x,y)
Compatible discrete gradients and multivariate visualization
Triangulation problem detected on the sphere case
Simplifying multivariate data through critical clusters
Extrema clusters of cardinality at least 10
Extrema clusters of cardinality at least 100
Extrema clusters of cardinality at least 400
Extrema clusters of cardinality at least 2000
extrema clusters
defined as the connected components of the intersection of maxima and minima clusters in the incidence graph
Concluding remarks
Warnings:
Concluding remarks
Warnings:
.....thanks
Additional slides
Python Lab
Examples
Basic tools in TDA: giotto TDA library giotto tda https://github.com/giotto-ai/giotto-tda
Classification of point clouds through persistence entropy and random forest method
A use case: Classification of hand-written digits
applications of TDA on the DONUT database https://donut.topology.rocks/?q=author%3A%22B.+Rieck%22
contacts
PH
Applied, from shape analysis to:
Introduced to outliers handling in PH [Carlsson, Zomorodian, 2007]
Applied to shape analysis
MPH
Ongoing research in
Ongoing research in
Discrete Topological Spaces
simplicial complexes
cubical complexes (regular or not)
Discrete Spaces:�regular cell complexes with intersection property
Definition
A cell complex ∑:
simplicial and cubical complexes satisfy the definition
Descriptors for MPH
stable, complete and not discrete
stable, not complete and not discrete
is the collection of all barcodes B(L) relative to a single direction L
Computing MPH
Persistence module retrieval
high computational costs, complete information mainly relevant for theoretical purposes
Persistence space retrieval
via foliation method [Biasotti et al., 2008], incomplete information, already applied to shape retrieval [Biasotti et al., ‘PHOG’, 2013]
MPH preprocessing
Morse-based to reduce the input size
Issues in Morse-based preprocessing for MPH:
Results: size reduction
|M|
Time G
Time M
scale factor
timings
number of fields
Iuricich, S., Landi, De Floriani: “A discrete Morse-based approach to multivariate data”, Simposium in Visualisation in SIGGRAPH Asia 2016.
number of critical cells left
time to retrieve the gradient V
time to retrieve the complex M from V
7.3x < … < 40x
x,xx minutes
up to 3 …. but independent
Results: comparing to related works
Timings linear over all considered datasets in the number of cells with
very different proportionality coefficients
Memory maximum peaks around 10GB against 25GB
… under submission to Computational Geometry: Theory and Algorithms
… under submission to Computational Geometry: Theory and Algorithms
Global [Allili et al., 2017a]
Ours [Iuricich, S., Landi, De Floriani, 2016]
persistence module computed via Topcat Library [Gafvert, 2016]
original datasets are compared to the reduced via discrete gradient
Grades : the number of cell complexes in the filtration
Time : time needed to perform Topcat routine
… under submission to Computational Geometry: Theory and Algorithms
Persistence module evaluation
Persistence space evaluation�wrt number of linear restrictions
persistence space over 4 to 100 uniformly selected linear restrictions.
original datasets are compared to reduced datasets and reduced plus reduction timings
Persistence space evaluation�wrt different PH computations
Time comaprison of several PH algorithm with and without Morse reduction
Computing the persistence space over 100 slices. Each color indicates results for different PH algorithms
reduced datasets:
max time sum of all slices = 2 seconds
unreduced datasets:
max time sum of all slices = 25 seconds
… under submission to Computational Geometry: Theory and Algorithms