1 of 55

Theoretical Foundations for Artificial Intelligence (AI) Inspired from Understanding Biological Intelligence (BI)� Detecting Phase Transitions & Quantifying the Degree of Emergence in Deep Learning

1

Paul Bogdan

Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California

pbogdan@usc.edu

The 7th workshop on Neural Scaling Laws - Scaling, Transfer & Multilingual Models�Monday, July 22, 2024, Vienna, Courtyard Vienna Prater/Messe

1

2 of 55

Motivation & Grand Challenges

  • What are the topological characteristics of living neural networks (LNNs)?

  • Do distributed interaction strategies of LLNs help ANNs?

  • How can we detect phase transitions in LNNs and artificial neural networks (ANNs) as well as control them in ANNs?

  • How can we quantify the degree of emergence in ANNs?

2

3 of 55

Complexity of Neuronal Networks

  • Neuronal networks exhibit a self-optimizing behavior
    • Capable of optimizing network latency, throughput, robustness without a central controller
  • Neuronal network topologies
    • Can’t be captured by existing models
      • Erdos-Renyi (ER)
      • Strogatz-Watts (WS)
      • Barabasi-Albert (BA)
    • Weighted multifractal graph (WMG) generator provides better fitting

3

Yin et al. “Network science characteristics of brain-derived neuronal cultures deciphered from quantitative phase imaging data”, Nature Scientific Reports, 2020

4 of 55

New Mathematics for Complex Networks

  • Discover generating rules of complex networks (CNs)
    • Capture heterogeneity & multifractality of CNs at functional level
    • Decode the complexity encoded in interaction weights
  • Heterogeneity of nodes & their interactions
    • Node attributes and group behavior
    • Link predictions are varying
    • Time-varying and bipartite features
  • Dealing with uncertainty in data
    • Incomplete observations, noisy system data, detecting missing / spurious interactions
      • How much info is needed to infer a hidden generating model?
    • Denoising networks and graph neural networks (GNNs)
  • Modeling brain & genomic (4D nucleome) networks

4

5 of 55

Weighted Multifractal Graph Generator

  •  

5

 

 

 

 

 

 

 

 

 

 

 

6 of 55

WMG Inference vs. Network Complexity

  • Reconstruct parameters of WMG
    • N0 = 105 ground truth network
    • 500 nodes sufficient to infer WMGG
  • Limits of WMG recoverability
    • Mean relative absolute error (RAE) increases with multifractal spectrum width

6

Non-recoverable

Recoverable

R. Yang, F. Sala, and P. Bogdan, Nature Communications Physics, 2021

7 of 55

WMGG Can Mine Insect Brain Data (I)

  • Modeling Drosophila brain connectome
    • Sparse network of 10790 neurons and 6444 identified synaptic connections
  • WMGG reconstruction
    • Lower bound of the log-likelihood (for learning the WMGG) converges fast
    • Less than 20 iterations are required for WMGG identification

7

Inferred WMG model with M=2, K=3

Data and brain image source: Xu, C. Shan, et al. "A connectome of the adult drosophila central brain.”(bioRxiv 2020) Dataset: https://www.janelia.org/project-team/flyem/hemibrain

8 of 55

WMGG Can Mine Insect Brain Data (II)

  • WMGG captures well Drosophila’ graph properties
    • Clustering coefficient distribution
    • Nodes’ degree distribution
    • Weight edges distribution

8

R. Yang, F. Sala, and P. Bogdan, Nature Communications Physics, 2021

9 of 55

Mining Cognition From Scarce Brain Nets

  • Mining cognition may require mining heterogeneous neuronal networks
    • Common network science metrics (e.g., degree, clustering) may fail
      • What if fewer nodes and links are observed?
      • What if we have to distinguish between smaller networks?
      • What if we have to distinguish between networks of various sizes?
      • ….

9

R. Yang, F. Sala, and P. Bogdan, Nature Communications Physics, 2021

10 of 55

Infer Generators of Brain Networks

  • Drosophila’ connectome in various regions
    • OL – optic lobe
      • ME – medulla
      • AME – accessory medulla
      • LO – lobula
      • LOP – lobula plate
  • WMGG analysis
    • Various cognitive areas exhibit distinct WMGGs

10

 

 

 

 

 

 

 

 

 

 

    • AL – antennal lobe

R. Yang, F. Sala, and P. Bogdan, Nature Communications Physics, 2021

11 of 55

Motivation & Grand Challenges

  • What are the topological characteristics of LNNs?

  • Do distributed interaction strategies of LLNs help ANNs?

  • How can we detect phase transitions in LNNs and ANNs as well as control them in ANNs?

  • How can we quantify the degree of emergence in ANNs?

11

12 of 55

Infusing Flocking Intelligence into ANNs

  • Train neural networks (NNs) following collective motion rules
  • Some units (leaders) are “informed”, others are not
    • Leader neurons informed with prediction errors
    • Follower neurons learn by aligning with leaders
  • Centralized deep neural networks (DNNs) vs. distributed ANNs

12

C. Yin et al., arXiv link: https://arxiv.org/pdf/2310.07885.pdf

13 of 55

DNNs vs. Leader-Follower Neural Networks

  •  

13

14 of 55

LFNNs Performance vs. Leadership Size

  • Leader-follower neural networks (LFNNs)
    • LFNN’s performance & size of leadership exhibit complex relationship
    • LFNNs perform the worst

when leaders and followers are about 50/50

  • Ablation study of loss terms
    • With all three losses
    • Without follower loss
    • Without leader loss
    • With global loss only
    • Leader local loss is essential

14

C. Yin et al., arXiv link: https://arxiv.org/pdf/2310.07885.pdf

15 of 55

BP-Free Leader-Follower Neural Network

  •  

15

C. Yin et al., arXiv link: https://arxiv.org/pdf/2310.07885.pdf

16 of 55

Leaders vs. Followers

  • MNIST classification with a BP-free LFNN
      • With 2 hidden layers (32 workers in each layer)
      • Leader / follower ratio: 20 / 80
  • Leadership development

  • Leaders vs. followers
    • Networks with follower workers alone (red) achieve better results than networks with only leaders (brown)

16

C. Yin et al., arXiv link: https://arxiv.org/pdf/2310.07885.pdf

17 of 55

Motivation & Grand Challenges

  • What are the topological characteristics of LNNs?

  • Do distributed interaction strategies of LLNs help ANNs?

  • How can we detect phase transitions in LNNs and ANNs as well as control them in ANNs?

  • How can we quantify the degree of emergence in ANNs?

17

18 of 55

Differential Geometry of Networks

  • Ricci curvature in geometry

    • Measures the deviation of a manifold from being Euclidean in various tangential directions:
      • determined by averaging sectional curvatures (tangent planes)

18

  • Ricci curvature in networks

  • Discretization
    • Ollivier-Ricci curvature (ORC)
    • Forman-Ricci curvature (FRC)

Quantifies topological properties of graphs

Quantifies the volume growth

19 of 55

What is the Curvature of a DNN?

19

Geometric properties

Learning task

  • Can the entropic measure give insights on model performance?

  • Can the Forman-Ricci network entropy help us identify the number of epochs to train a DNN?

  • Can we use this differential geometry / curvature framework to prevent overfitting?

20 of 55

DNN’s Accuracy vs. FRC Entropy: Fashion-MNIST

  •  

20

M.R. Znaidi et al. "A unified approach of detecting phase transition in time-varying complex networks" Scientific Reports 13, no. 1, 2023

21 of 55

DNN’s Accuracy vs. FRC Entropy: CIFAR-10 Dataset

  •  

21

M.R. Znaidi et al. "A unified approach of detecting phase transition in time-varying complex networks." Scientific Reports 13, no. 1, 2023

22 of 55

Findings & Future Work

22

  • Can the entropic measure give insights on model performance?

  • Can the Forman-Ricci network entropy help us identify the number of epochs to train a DNN?

  • Can we use this differential geometry / curvature framework to prevent overfitting?

Correlation between Forman-Ricci curvature network entropy and training accuracy

Detect the learning phase transitions

Perform dropout based on Forman-Ricci curvature of each node

23 of 55

Motivation & Grand Challenges

  • What are the topological characteristics of LNNs?

  • Do distributed interaction strategies of LLNs help ANNs?

  • How can we detect phase transitions in LNNs and ANNs as well as control them in ANNs?

  • How can we quantify the degree of emergence in ANNs?

23

24 of 55

Mono-Fractal Analysis

  • Box-counting method
    • Partition the data set with minimal number N (r) of boxes of size r
    • Fractal dimension d is determined by

    • Fractal: power-law scaling dependence

24

d = 1.719

 

25 of 55

A Simple Example

25

0.3

0.3

0.05

0.05

0.05

0.05

0.2

0.2

0.1

0.2

0.3

0.4

0.5

5

4

3

2

1

0.3

0.1

0.05

0.05

0.05

0.05

0.2

0.2

0.1

0.2

0.3

0.4

0.5

5

4

3

2

1

26 of 55

Node-based Multifractal Analysis (I)

  • How can we identify a node (neuron, glia) responsible for a higher role in a cognitive task from the rest of network bulk?
    • Cover a graph with boxes of varying radii for each node

    • Assign a probability measure

    • Distortion factor q captures the topological differences
    • Topological partition function

26

q>0: Prioritize abundant patterns

q<0: Prioritize rare patterns

 

Multi-fractal scaling behavior

 

 

 

X. Xiao, H. Chen, and P. Bogdan, “Deciphering the generating rules and functionalities of complex networks”, Nature Scientific Reports, 2021

27 of 55

Node-based Multifractal Analysis (II)

  • Mass exponent (topological free energy)
      • Distinguishes between monofractal and multifractal behavior
  • Generalized fractal dimension 𝐷(𝑞)

  • Width of multifractal spectrum reflects network heterogeneity
      • The wider the spectrum, the more heterogeneous the network is

  • Lipschitz-Holder exponent 𝛼 characterizes the density and complexity of the network
      • Larger 𝛼 implies higher network density

27

𝜏(𝑞) vs 𝑞

Legendre transform

 

Multifractal spectrum

 

 

28 of 55

Emergence & Self-Organization in AI

28

    • Improvement of LLM responses with increasing training iterations
  • Study the emergence of the intelligence of LLMs

29 of 55

(Artificial) Neuronal Interaction Network (NIN)

  • Map LLM’s artificial neurons and their connections onto NIN

  • Node-based fractal analysis

29

30 of 55

Multifractal Analysis of LLM Training

  •  

30

31 of 55

Emergence in LLM Training

  • Degree of emergence quantification
    • Lambada OpenAI

    • PIQA

    • LogiQA

31

 

Relevance

Predictability + New Information

32 of 55

CPS Group & Collaborators

32

Yuankun Xue

Valeriu Balaban

Gaurav Gupta

Jyotirmoy Deshmukh (USC)

Hana Koorehdavoudi

Edmond Jonckheere (USC)

Jayson Sia

Mohamed R. Znaidi

Panagiotis Kyriakis

Yao Xiao

Thank you!

More info at https://cps.usc.edu/

George Pappas (Upenn)

Ruochen Yang

Mingxi Cheng

Graduated PhD students

Sergio Pequito (Delft Univ.)

Qi Cao

Xiong Ye Xiao

Current PhD students

Justin Rhodes (UIUC)

Roozbeh Kiani (NYU)

James Boedicker (USC)

Radu Balan (UMD)

Brief list of collaborators:

Frederic Sala (UWisconsin)

Chenzhong Yin

Nicholas Kotov (Univ. of Michigan)

Emily A. Reed

Kien Nguyen

Heng Ping

Shixuan Li

Gengshuo Liu

Shukai Duan

33 of 55

Graph Theory-aware ML (GTML)

33

  • Distance-based feature matrices for prediction of protein complexes

  • GTML exhibits higher performance
    • Forman-Ricci curvature
    • Ollivier-Ricci curvature
    • Fractal dimension

M. Cha, E.S.T. Emre, X. Xiao, J.-Y. Kim, P.B., S.J. VanEpps, A. Violi, N.A. Kotov, Unifying Structural Descriptors for Biological and Bioinspired Nanoscale Complexes, Nature Comp. Sci., 2022

34 of 55

BA Prediction via An Interpretable 3D-CNN

  • (A) Proportions of participants in datasets
  • (B) T1-weighted MRIs & 3D saliency maps
  • (C) Participants split by sex and assigned randomly to training & test sets
  • (D) 3D-CNN’s
    • Input: MRIs
    • Output: BA estimates

34

Chenzhong Yin, Mihai Udrescu, Gaurav Gupta, Mingxi Cheng, Andrei Lihu, Lucretia Udrescu, Paul Bogdan, David M. Mannino, and Stefan Mihaicuta. "Fractional dynamics foster deep learning of COPD stage prediction." Advanced Science (2023): 2203485. https://onlinelibrary.wiley.com/doi/full/10.1002/advs.202203485

C. Yin et al., "Anatomically interpretable deep learning of brain age captures domain-specific cognitive impairment" Proceedings of the National Academy of Sciences, 2023 https://www.pnas.org/doi/abs/10.1073/pnas.2214634120

35 of 55

Block-wise BP-Free (BWBPF) Network

  •  

35

36 of 55

Results on Cifar-10 and Tiny-ImageNet

  • Error rate of multiple networks
    • Networks’ results on Cifar-10

    • Networks’ results on Tiny-ImageNet

  • BWBPF with multiple blocks
    • With K blocks on ResNet101

36

A. Cheng, H. Ping, Z. Wang, X. Xiao, C Yin, S. Nazarian, M. Cheng, and P. Bogdan. "Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks" IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024 https://ieeexplore.ieee.org/document/10447377

    • With K blocks on ResNet152

37 of 55

WMG Can Mine 4D Nucleome Data

  • Chromatin contact maps of yeast genome
    • Chromosomal interaction undergo reorganization in different growing states
      • Quiescent state:
        • larger non-diagonal values of linking probabilities (stronger inter-chromosomal interactions)
      • Exponentially growing state:
        • larger diagonal values of linking probabilities (stronger intra-chromosomal interactions)
  • WMGG formalism can
    • Distinguish the growing states
    • Model structural & dynamical features of chromosomes
    • Help at understand their transcriptional regulation mechanism

37

Yeast in quiescence

Exponentially growing yeast

Data source: M.T. Rutledge et al. The yeast genome undergoes significant topological reorganization in quiescence. Nucleic acids research, 2015

38 of 55

Chemical, Geometrical & Graph Descriptors

38

39 of 55

Graph Theory-aware ML (GTML)

39

  • Distance-based feature matrices for prediction of protein complexes

  • Graph theory-aware machine learning exhibits higher performance
    • Forman-Ricci curvature
    • Ollivier-Ricci curvature
    • Fractal dimension

M. Cha, E.S.T. Emre, X. Xiao, J.-Y. Kim, P.B., S.J. VanEpps, A. Violi, N.A. Kotov, Unifying Structural Descriptors for Biological and Bioinspired Nanoscale Complexes, Nature Comp. Sci., 2022

40 of 55

Prediction of Protein–Protein Complexes

40

41 of 55

Quantifying Phase Transitions

  • Structural changes and fluctuations are difficult to quantify for phases with high asymmetry and multiscale organization
    • Long-range graph theory descriptors provide characterization of phase transitions in water / surfactant systems
  • Molecular dynamic simulation
    • Binary solutions of n-octyltrimethylammonium bromide (OTAB) in water
  • Phase diagrams
      • Spheroidal micelles (isotropic phase) 🡪 cylinders (hexagonal phase): x = 0.204
      • Cylinders (hexagonal phase) 🡪 lamellae (lamellar phase): x = 0.431

41

x=0.409

x=0.431

x=0.930

x=0.007

x=0.190

x=0.204

micelles 🡪 cylinders

cylinders 🡪 lamellae

42 of 55

Molecular Dynamics Descriptors

  • No discontinuity / inflection detected near phase transitions
      • Left: radial distribution function (RDF) for terminal tail beads
      • Middle: potential energy of tail-tail, tail-water, and head-water interactions
      • Right: free energy changes between real and ideal solutions

  • Average P2 order parameter exhibits plateaus for two phases
    • Highest gradient in P2 at x = 0.357 is supposed to indicate a phase transition
      • Sensitive to changes in the shape of micelles from circular to cylindrical taking place for x between 0.2 and 0.4 than to their packing

42

43 of 55

Graph Description of Complex Systems

  • Graph representation
    • Node = surfactant
    • Assign edge with a Euclidean distance between pair tails smaller than 0.7nm
  • Local / short-term measurement
    • Degree centrality
    • Clustering coefficient
  • Global / long-term measurement
    • Closeness centrality
    • Betweenness centrality
    • Node-based fractal dimension (NFD)
      • Measures self-similarity of a graph by calculating the spatial dimension of the expansion from a certain node based on box-growing method

43

44 of 55

GT Shades Light on Phases

  • Phase transition from micellar to hexagonal phase at x = 0.204 is associated with sharp changes in
    • Closeness centrality (CC)
    • Betweenness centrality (BC)
    • Node-based fractal dimension (NFD)
  • Phase transition from hexagonal to lamellar phase at x = 0.431 exhibits a plateau in terms of CC, BC and NFD

44

45 of 55

Tracking Phase Transitions via GT Metrics

  • Micelles 🡪 cylinders (x = 0.204)
    • Closeness centrality and NFD change from relatively broad distributions to sharp peaks centered at 0.035 and 1.1
  • Cylinders 🡪 lamellae (x = 0.431)
    • Increase in the surfactant concentration broadens the distributions of closeness centrality and NFD, which display longer tails toward larger values

45

46 of 55

Structural Fluctuations & Phase Transitions

  • Coexistence of 2 phases near phase transition
  • Structural fluctuations between micelles and cylinders can be detected by closeness centrality

46

Colored by value of closeness centrality

Max - blue; Min - red

Dominant of micellar 🡪

mixture 🡪

dominant of hexagonal

47 of 55

Multifractality vs. Phase Transitions

  •  

47

48 of 55

Forman-Ricci Curvature (FRC) (I)

  • Simplicial Complexes
    • Combinatorial structures used to represent discrete data
      • Graphs
      • Networks
      • High-dimensional datasets
  • Discrete curvature applied to weighted graphs
    • Typically defined based on combinatorial Laplacian matrix of the graph
    • Captures how much an edge connects the neighborhood of the vertices it touches

48

49 of 55

Forman-Ricci curvature (FRC) (II)

  •  

49

M.R. Znaidi,J. Sia, S. Ronquist, I. Rajapakse, E. Jonckheere & P. Bogdan. "A unified approach of detecting phase transition in time-varying complex networks." Scientific Reports 13, no. 1, 2023

50 of 55

Forman-Ricci curvature (FRC) (III)

9/47

 

Advantage: It is far simpler to evaluate on large networks than Ollivier-Ricci curvature!

 

 

 

51 of 55

Augmented Forman-Ricci curvature (FRC)

9/47

 

  • Augmented FRC
    • Accounts for 2D simplicial complexes arising in graphs
    • Observed correlation between the two discretizations (ORC-FRC) is even higher

Advantage: It is far simpler to evaluate on large networks than Ollivier-Ricci curvature!

52 of 55

Worker Activity in LFNNs

  • Leaders
    • Update their parameters based on global and local prediction losses
  • Followers
    • Update their parameters to align with leaders
  • Worker activity: neuron output before & after weight update

  • Patterned movement in an LFNN
    • Followers align with leaders

52

C. Yin et al., arXiv link: https://arxiv.org/pdf/2310.07885.pdf

53 of 55

Differential Geometry of Networks (I)

  • Ricci curvature1 (RC) in geometry

    • Measures the deviation of a manifold from being Euclidean in various tangential directions:
      • determined by averaging sectional curvatures (tangent planes)

53

[1] M. Weber et al. "Characterizing complex networks with Forman-Ricci curvature and associated geometric flows." Journal of Complex Networks 5.4 (2017)

[2] M. Weber et al. "Coarse geometry of evolving networks." Journal of Complex Networks 6.5 (2018)

Quantifies the volume growth

 

 

54 of 55

Differential Geometry of Networks (II)

  • Ricci curvature

    • Measures the deviation of a manifold from being Euclidean in various tangential directions:
      • determined by averaging sectional curvatures (tangent planes)

54

 

 

 

55 of 55

Differential Geometry of Networks (III)

  • Ricci curvature in geometry

    • Measures the deviation of a manifold from being Euclidean in various tangential directions:
      • determined by averaging sectional curvatures (tangent planes)

55

  • Ricci curvature in networks

  • Discretization
    • Ollivier-Ricci curvature (ORC)
    • Forman-Ricci curvature (FRC)

Quantifies topological properties of graphs