1 of 35

Persistent Topological Features in�Large Language Models

Yuri Gardinazzi

Workshop: Interpretability in LLMs using Geometrical and Statistical Methods

27/5/2025

With Karthik Viswanathan, Giada Panerai, Alessio Ansuini, Alberto Cazzaniga & Matteo Biagetti (ICML 2025)

2 of 35

What’s going on inside Large Language Models?

2

welcome

to

the

geom.

workshop

LLM

Input prompt

output

3 of 35

What’s going on inside Large Language Models?

3

welcome

to

the

geom.

LLM

Step 1: Tokenization

 

 

last token representation

Input prompt

output

workshop

4 of 35

What’s going on inside a Large Language Models?

4

input embedding

hello

Transformer layers

Model

# Transformer layers

Hidden Dimension

Llama 2 7B

32

4096

Llama 3 8B

32

4096

Mistral 7B

32

4096

Pythia 6.9B

32

4096

output layers

5 of 35

What’s going on inside a Large Language Models?

5

hello

 

 

 

 

Last token

 

Layers of internal representations

6 of 35

What’s going on inside a Large Language Models?

6

 

 

 

 

 

Hypothesis: The distribution of prompts is related to the inner workings of LLMs.

Goal: Describe global features of LLMs that are consistent across different models.

Strategy: Analyse internal representations with Topological Data Analysis.

7 of 35

Topological Data Analysis comes to rescue!

7

Connected components

Loops

Voids

We can look at the shape of data

  • Coordinate Invariance
  • Deformations Invariance
  • Information Compression

8 of 35

Persistent Homology: Vietoris-Rips filtration

8

1

2

3

4

 

 

Connected components

Loops

9 of 35

Persistent Homology: Vietoris-Rips filtration

9

1

2

3

4

 

 

Connected components

Loops

10 of 35

Persistent Homology: Vietoris-Rips filtration

10

 

1

2

3

4

 

Connected components

Loops

11 of 35

Persistent Homology: Vietoris-Rips filtration

11

 

1

2

3

4

 

Connected components

Loops

12 of 35

What if your point cloud evolves in time?

12

13 of 35

ZigZag persistence

13

 

 

 

 

 

 

 

 

 

 

Connected components

Loops

Carlsson, Gunnar, and Vin De Silva. "Zigzag persistence." Foundations of computational mathematics 10 (2010): 367-405.

14 of 35

ZigZag persistence on transformers

14

15 of 35

Filtration: Knn graph

15

 

Non linear transformation from layer to layer make the choice of a radius for Vietoris-Rips not trivial.

Three adjacent edges are triangles

Six adjacent edges are tetrahedra

16 of 35

Toy Example: �“Let’s do some calendar math. Four months from [MONTH]

16

17 of 35

Toy Example: �“Let’s do some calendar math. Four months from [MONTH]

17

18 of 35

Results: Effective Persistence Image

18

 

19 of 35

Results: Birth Relative Frequency

19

Rate with at which new p-dimensional holes are created

20 of 35

Results: Inter-Layer Persistence (1/2)

20

Fraction of loops alive at layer L1 that are still alive at layer L2 (and were alive the whole path)

21 of 35

Results: Inter-Layer Persistence (1/2)

21

Probability that features alive at certain layer are still alive in earlier or later layers.

Low α: more weight to short-lived features

High α: more weight to long-lived features

22 of 35

Results: Inter-Layer Persistance (1/2)

22

Probability that features alive at certain layer are still alive in earlier or later layers.

23 of 35

Results: Inter-Layer Persistance (2/2)

23

24 of 35

Results: Relation to performance

24

25 of 35

Results: Layer Pruning

25

Other works: Gromov et al. (2024), Men et al. (2024)

Layer pruned by cutting the block of layers that lies within the 10% of the maximum value of Inter-Layer Persistence

26 of 35

Conclusions

  • We apply TDA to interpret LLMs behaviors.

  • With ZigZag we study trajectories of point clouds that evolve in time (or through layers).

  • We can distinguish different phases of prompt processing.

  • We can measure the rearrangement of points through different layers.

26

yuri.gardinazzi@areasciencepark.it

27 of 35

Thank you!

27

28 of 35

Support: Inter-Layer Persistance

28

Power weighed Inter-layer persistence.

Probability that features alive at certain layer are still alive in earlier or later layers.

Betti number: number of p-dimensional holes

29 of 35

Inter-Layer Persistence

29

30 of 35

Births Relative Frequency

30

31 of 35

Persistence

31

32 of 35

Bigger models

32

33 of 35

Different Knn – Inter Layer Persistence

33

34 of 35

Sliding Window

34

35 of 35

Sliding Window

35