1 of 130

Data Science and Visualization in Health

Networks and Graphs

André Santanchè

Laboratory of Information Systems – LIS

Institute of Computing – UNICAMP

August 2025

2 of 130

Graphs?

3 of 130

Graph - Mathematical Model

nodes

edges

d

a

b

c

4 of 130

Why Graphs?

5 of 130

Graphs and the Real-world

Challenge

Select one example of a real-world problem and represent it as a graph.

Define:

  • What are the nodes?
  • What are the edges?

6 of 130

The Emergence of Network Maps

  • Movie Actor Network, 1998
  • World Wide Web, 1999
  • C elegans neural wiring diagram 1990
  • Citation Network, 1998
  • Metabolic Network, 2000
  • PPI network, 2001

7 of 130

Political/Financial Networks

Mark Lombardi: tracked and mapped global financial fiascos in the 1980s and 1990s from public sources such as news articles.

Mark Lombardi Industries Carlos Cardoen of Santiago, Chile c. 1982-90 (2nd Version) 2000

8 of 130

Understanding Through Visualization

"[...] Department of Homeland Security came in to take a look. They said they found the work revelatory, not because the financial and political connections he mapped were new to them, but because Lombardi showed them an elegant way to array disparate information and make sense of things, which they thought might be useful to their security efforts.[...]"

Michael Kimmelman�Webs Connecting the Power Brokers, the Money and the World�NY Times November 14, 2003

9 of 130

Understanding Through Visualization

"I happened to be in the Drawing Center when the Lombardi show was being installed and several consultants to the Department of Homeland Security came in to take a look. They said they found the work revelatory, not because the financial and political connections he mapped were new to them, but because Lombardi showed them an elegant way to array disparate information and make sense of things, which they thought might be useful to their security efforts. I didn't know whether to nd that response comforting or alarming, but I saw exactly what they meant.”

Michael Kimmelman�Webs Connecting the Power Brokers, the Money and the World�NY Times November 14, 2003

10 of 130

How Wolves Change Rivers

https://youtu.be/ysa5OBhXz-Q

  • Yellowstone National Park
  • Wolves→ Elks → Vegetation → Erosion

→ Water → Beavers

→ Dams → Fish

11 of 130

Les Miserables

(Neo4j example)

12 of 130

Modeling the World as Graphs

13 of 130

Models

14 of 130

Models

Data Storage

Data Management

Application

Application

Application

Conceptual

Model

Logic Model

Physical

Model

15 of 130

Models

Application

Application

Application

Conceptual

Model

16 of 130

membro:

Doriana

livro:

Dinolândia

empréstimo

abstraction

fez

empréstimo

escreveu

Dinolândia

domain of discourse

universo de discurso

autor:

Horário

autoria

17 of 130

Models

Data Management

Application

Application

Application

Conceptual

Model

Logic Model

18 of 130

The Bridges of Königsberg

19 of 130

The Bridges of Königsberg

Task

  • Draw a path where you can cross all the bridges, but each bridge only once.

Desenhe um caminho em que você seja capaz de cruzar todas as pontes, mas cada ponte apenas uma vez.

20 of 130

The Bridges of Königsberg

  • Challenge
    • cross each bridge only once
  • Leonhard Euler in 1736
    • proved that the problem has no solution

21 of 130

Bridges as a Graph

22 of 130

Bridges as a Graph

Conceptual

Model

Logic Model

23 of 130

Bridges as a Graph

  • Degree of all nodes must be even
    • except terminals (start/end)

Leonhard Euler in 1736

24 of 130

Simple

Principles

Complex

Systems

25 of 130

A Homogeneous Graph

26 of 130

The Eight Cities

Images generated by Gemini.

27 of 130

The Eight Cities

28 of 130

The Eight Cities as a Graph

  • Nodes are cities
  • Edges are roads
  • All the cities have the same size and population
  • All the roads have the same length

29 of 130

The Eight Cities as a Graph

Challenge

  • Representing in Tables
  • How do you represent a graph in a spreadsheet?
    • You can use more than one tab

30 of 130

Models

Data Storage

Data Management

Application

Application

Application

Conceptual

Model

Logic Model

Physical

Model

31 of 130

Mapping Logical to Physical

Logical

Conceptual

Graph

Stored as a Graph

mapping

32 of 130

Mapping Logical to Physical

Logical

Conceptual

Graph

Stored as a Table

mapping

33 of 130

Graph - Mathematical Model

nodes

edges

d

a

b

c

34 of 130

Graph - Mathematical Model

  • Graph
    • pair of sets G=(V,E)

nodes - V

edges - E

d

a

b

c

35 of 130

Undirected Graph - Mathematical Model

  • Graph
    • pair of sets G=(V,E)
  • Example
    • V = {a, b, c, d}
    • E = { {a,b}, {a,c}, {a,d}, {b,d}, {c,d} }

nodes - V

edges - E

d

a

b

c

36 of 130

Directed Graph - Mathematical Model

  • Graph
    • pair of sets G=(V,E)
  • Example
    • V = {a, b, c, d}
    • E = { (a,b), (a,c), (d,a), (d,b), (d,c) }

nodes - V

edges - E

d

a

b

c

37 of 130

Undirected Graph - Adjacency Matrix

d

a

b

c

a

b

c

d

a

0

1

1

1

b

1

0

0

1

c

1

0

0

1

d

1

1

1

0

38 of 130

Directed Graph - Adjacency Matrix

d

a

b

c

a

b

c

d

a

0

1

1

0

b

0

0

0

0

c

0

0

0

0

d

1

1

1

0

39 of 130

Directed Graph - Edge List

d

a

b

c

source

target

a

b

a

c

d

a

d

b

d

c

40 of 130

The Eight Cities as a Graph

Challenge

  • Representing in Tables
  • How do you represent a graph in a spreadsheet?
    • You can use more than one tab

41 of 130

Logical to Physical Model

Rot Donnadd

Pid Mught

Thulk Lebbimp

Bouvossam Damme

Pirg Zall

42 of 130

Logical to Physical Model

Rot Donnadd

Pid Mught

Thulk Lebbimp

Bouvossam Damme

Pirg Zall

Logical

Graph

Stored as a Graph

mapping

43 of 130

Logical to Physical Model

Nome

Rot Donnadd

Pid Mught

Thulk Lebbimp

Bouvossam Damme

Pirg Zall

Origem

Destino

Rot Donnadd

Pid Mught

Rot Donnadd

Thulk Lebbimp

Pid Mught

Rot Donnadd

Pid Mught

Pirg Zall

Pid Mught

Bouvossam Damme

Thulk Lebbimp

Rot Donnadd

Thulk Lebbimp

Pirg Zall

Bouvossam Damme

Pid Mught

Bouvossam Damme

Pirg Zall

Pirg Zall

Bouvossam Damme

Pirg Zall

Pid Mught

Pirg Zall

Thulk Lebbimp

Logical

Graph

Stored as a Table

mapping

44 of 130

45 of 130

46 of 130

Knowledge Graph

RAS

B-RAF

C-RAF

MEK

ERK

PLX4072

cell adhesion

cell-cell adhesion

cell-cell adhesion mediated by cadherin

cell-cell adhesion via plasma-membrane adhesion molecules

calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules

cellular process

biological process

developmental process

cell morphogenesis

Phenomena Graph

47 of 130

Knowledge Graph

cell adhesion

cell-cell adhesion

cell-cell adhesion mediated by cadherin

cell-cell adhesion via plasma-membrane adhesion molecules

calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules

cellular process

biological process

developmental process

cell morphogenesis

48 of 130

Machines talking to Machines

49 of 130

The Semantic WebMachines talking to Machines

Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284(5), 28–37.

50 of 130

Case Study

Chronic Myeloid Leukemia (CML)

51 of 130

Philadelphia Chromosome

52 of 130

Myeloid Stem Cell

By Cancer Research UK - Original email from CRUK, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=34334439

53 of 130

CML in a 4 years old female

54 of 130

Cell?

"Cell of Saint Teresa de Ávila in the Convent of Saint Joseph" (Wikipedia 2022)

"... is the process by which a cell uses its plasma membrane to engulf a large particle"

55 of 130

Cell?

"Cell of Saint Teresa de Ávila in the Convent of Saint Joseph" (Wikipedia, 2022)

"... is the process by which a cell uses its plasma membrane to engulf a large particle" (Wikipedia, 2022)

56 of 130

Explicit Semantics

57 of 130

Human Perspective

58 of 130

Machine Pattern Recognition

(e.g., by machine learning)

59 of 130

Semantic Web

60 of 130

Formal Knowledge

61 of 130

WordNet

62 of 130

Ontology

“An ontology is a formal, explicit specification of a shared conceptualisation.” (Studer et al., 1998)

63 of 130

Conceptualization

64 of 130

Formal, Explicit Specification

65 of 130

Ontology Spectrum

(Welty et al., 1999)

66 of 130

Starting from a Concept

Chronic Myeloid Leukemia (CML)

67 of 130

Connecting Concepts

Chronic Myeloid Leukemia (CML)

Leukemia

is a

68 of 130

Knowledge Graph

69 of 130

Knowledge Graph

Chronic Myeloid Leukemia (CML)

Leukemia

is a

70 of 130

Knowledge Graph

Chronic Myeloid Leukemia (CML)

Leukemia

is a

concept

relationship between concepts

71 of 130

Concept

nucleous

72 of 130

Related Concepts

nucleous

intracellular anatomical structure

part of

73 of 130

Knowledge Graph

nucleous

intracellular anatomical structure

part of

concept

relationship between concepts

74 of 130

Knowledge Graph

75 of 130

Tent of Miracles

Jorge Amado

author

Itabuna

bithPlace

DBPedia

recurso propriedade valor

Tent of Miracles author Jorge Amado

Jorge Amado birthPlace Itabuna

76 of 130

Knowledge Graph

(Ji et al., 2022)

77 of 130

Case Study

Human Symptom-Disease Network

78 of 130

Zhou, X., Menche, J., Barabási, A.-L., & Sharma, A. (2014). Human symptoms–disease network. Nature Communications, 5(1), 4212. https://doi.org/10.1038/ncomms5212

79 of 130

Clustered Regions - Same Disease Category

80 of 130

81 of 130

82 of 130

Diseases and Symptoms Table

disease

symptom

association

Influenza

Fever

136

Influenza

Headache

17

Influenza

Fatigue

5

Myocardial Infarction

Chest Pain

1005

Myocardial Infarction

Fatigue

74

Diabetes Complications

Obesity

126

Diabetes Complications

Albuminuria

45

Diabetes Complications

Acute Coronary Syndrome

19

Diabetes Complications

Diarrhea

7

83 of 130

84 of 130

Gene Ontology

85 of 130

mitochondrial DNA repair

GO:0043504

86 of 130

Ontologies

GO:0016055

Wnt signaling pathway

frizzled signaling pathway

Wnt signaling pathway

rdfs:label

obo:ExactSynonym

GO:0060071

Wnt signaling pathway, planar cell polarity pathway

taxon:9606

up:Q7Z3G6

up_core:organism

Prickle-like protein 2

rdfs:label

Homo sapiens

up_core:scientificName

rdfs:subClassOf

up_core:classifiedWith

87 of 130

Enrichment

Wang, Y., & Iha, H. (2023). The Novel Link between Gene Expression Profiles of Adult T-Cell Leukemia/Lymphoma Patients’ Peripheral Blood Lymphocytes and Ferroptosis Susceptibility. Genes, 14(11), 2005. https://doi.org/10.3390/GENES14112005

88 of 130

Chandak, P., Huang, K., & Zitnik, M. (2023). Building a knowledge graph to enable precision medicine. Scientific Data 2023 10:1, 10(1), 1–16. https://doi.org/10.1038/s41597-023-01960-3

89 of 130

Overview of PrimeKG

90 of 130

Downloading Datasets

91 of 130

92 of 130

93 of 130

Task

Given this knowledge graph (PrimeKG), what kind of inferences can I make with it?

Dado este grafo de conhecimento (PrimeKG). Que tipo de inferências eu posso fazer com ele?

94 of 130

RAS

B-RAF

C-RAF

MEK

ERK

PLX4072

Phenomena Graph

95 of 130

Protein-Protein Interaction

Protein

Protein

Interaction

RAS

B-RAF

C-RAF

MEK

ERK

PLX4072

Case Study - Melanoma

96 of 130

Tables as Networks

97 of 130

Table to Graph

Nodes

98 of 130

Graph

Table

id/prop1

prop2

prop3

value1

value2

value3

value1

value2

value3

value1

value2

value3

value1

value2

value3

value1

value2

value3

99 of 130

Graph

Table

id/prop1

prop2

prop3

value1

value2

value3

value1

value2

value3

value1

value2

value3

value1

value2

value3

value1

value2

value3

node

prop1: value1

prop2:value2

prop3: value3

node

prop1: value1

prop2:value2

prop3: value3

100 of 130

Graph

Table

#node

identifier

degree

ABL1

9606.ENSP00000361423

20

AKT1

9606.ENSP00000451828

24

AKT2

9606.ENSP00000375892

20

AKT3

9606.ENSP00000500582

20

ARAF

9606.ENSP00000290277

10

101 of 130

Graph

Table

#node

name

identifier

degree

ABL1

9606.ENSP00000361423

20

AKT1

9606.ENSP00000451828

24

BRAF

9606.ENSP00000419060

12

CBL

9606.ENSP00000264033

17

:Protein

name: AKT1

identifier: 9606…28

:Protein

name: BRAF

identifier: 9606…60

102 of 130

Table to Graph

Edges

103 of 130

Graph

Table

origin

target

prop1

prop2

prop3

id1

id2

value1

value2

value3

id1

id2

value1

value2

value3

id1

id2

value1

value2

value3

id1

id2

value1

value2

value3

id1

id2

value1

value2

value3

node

node

edge

prop1: value1

prop2:value2

prop3: value3

104 of 130

Graph

Table

#node1

node2

experimentally�determined�interaction

experimental

database�annotated

database

AKT1

BRAF

0.631

0

ATK1

PIK3R1

0.558

0.9

ARAF

MAPK1

0.11

0.5

ARAF

KRAS

0.745

0.9

edge

experimental: 0.631

database: 0

:Protein

:Protein

name: AKT1

identifier: 9606…28

name: BRAF

identifier: 9606…60

105 of 130

Desnormalização

#node1

node1.identifier

node2

node2.identifier

experimentally�determined�interaction

experimental

database�annotated

database

AKT1

9606.ENSP00000451828

BRAF

9606.ENSP00000419060

0.631

0

ATK1

9606.ENSP00000451828

PIK3R1

0.558

0.9

ARAF

MAPK1

0.11

0.5

ARAF

KRAS

0.745

0.9

106 of 130

107 of 130

Database of known and predicted protein-protein interactions

108 of 130

Search for Melanoma

109 of 130

Selecting Dataset

110 of 130

Selecting Dataset

111 of 130

CML in KEGG

112 of 130

CML in STRING

113 of 130

Labels

114 of 130

Showing just Experimental Physical Interactions

115 of 130

CML in STRING (Experimental Physical Interactions)

116 of 130

Exporting

117 of 130

118 of 130

CytoScape

119 of 130

Melanoma - CytoScape

120 of 130

Task

Given this protein interaction graph, what kind of analyses can I perform with it?

Dado este grafo de interação entre proteínas. Que tipo de análises eu posso fazer com ele?

RAS

B-RAF

C-RAF

MEK

ERK

PLX4072

121 of 130

Descoberta

122 of 130

Roads Flow

Castelo, Campinas (OpenStreetMaps, 2015)

123 of 130

Food Web - FishBase

Freshwater food web: Neo Martinez and Richard Williams.

(Cavoto et al., 2015)

124 of 130

Emergence

  • Birds and Fishes
  • Brain and Consciousness

125 of 130

Knowledge Graph

RAS

B-RAF

C-RAF

MEK

ERK

PLX4072

cell adhesion

cell-cell adhesion

cell-cell adhesion mediated by cadherin

cell-cell adhesion via plasma-membrane adhesion molecules

calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules

cellular process

biological process

developmental process

cell morphogenesis

Phenomena Graph

Discovery

Inference

126 of 130

RAS

B-RAF

C-RAF

MEK

ERK

PLX4072

cell adhesion

cell-cell adhesion

cell-cell adhesion mediated by cadherin

cell-cell adhesion via plasma-membrane adhesion molecules

calcium-dependent cell-cell adhesion via plasma membrane cell adhesion molecules

cellular process

biological process

developmental process

cell morphogenesis

Our Approach

Discovery

Inference

127 of 130

References

  • Cavoto, P., Cardoso, V., Lebbe, R. V., & Santanche, A. (2015). FishGraph: A network-driven data analysis. Proceedings - 11th IEEE International Conference on EScience, EScience 2015. https://doi.org/10.1109/eScience.2015.61

128 of 130

Acknowledgements

129 of 130

130 of 130

License and Acknowledgements

  • These slides are shared by a Creative Commons License, under the following conditions: Attribution, Noncommercial and Share Alike. See further details at https://creativecommons.org/licenses/by-nc-sa/4.0/
  • Thanks to California Academy of Sciences [https://www.flickr.com/photos/casgeology/8981509235/] for its images adopted in the cover and background of the slides. See its specific license on the site.