2 of 30

Biology as a System

Understanding biology holistically and integratively — structures, functions, and interactions.

Biological systems are complex systems difficult to model and predict their responses to perturbations or interventions.

3 of 30

Modelling Biological Systems

Not walking away from complexity

Holistic understanding — Studying biology as systems allows us to understand the complex interactions and dependencies among biological components, beyond isolated parts
Integrative approach — Aggregating data from multiple disciplines and technologies helps to better understand life processes
Emergent properties analysis — Many biological phenomena cannot be understood by studying components individually but emerge from their interactions
Real-world applications — A systems perspective aims to model complex global challenges

4 of 30

Network Biology

Helps us to visualise and analyse complex biological systems as networks of interactions (e.g., protein-protein, gene regulatory, or metabolic pathways), aiding in understanding their structure and function
Networks model diverse biological datasets into interpretable frameworks, enabling researchers to uncover patterns that might be overlooked otherwise (e.g., multi-omics data — genomics, proteomics, etc.)
Researchers can use network properties to identify critical nodes or pathways disrupted in specific conditions like disease or define how biological systems maintain function, guiding strategies to enhance resilience or address fragility
Computational models using network structures allow prediction of outcomes, reducing reliance on trial-and-error experiments

5 of 30

What is a Graph/Network?

Data structures of components (nodes) connected by relationships (edges)

Social networks

Biological networks

7 of 30

Why Graphs?

These structures allow:

Quick integration of heterogeneous data based on relationships
Graph theory methods can be used to analyse and interpret data, e.g., topological properties can be used to explain:

The possible role of specific components
The flow of information
The robustness of the system

Visualize data

8 of 30

How to Analyse Graph Structures

Using and Analysing Relationships

Graph Theory: algorithms that allow you to extract relevant information from the topology of the graph.

Topological Features: Centrality, degree, clustering, etc.

Graph Machine Learning:

Embeddings
Graph Neural Networks

https://huggingface.co/blog/intro-graphml

9 of 30

Graph Theory — Some Topological Properties

Topological properties can help extract meaningful information and identify relevant structures within the network

Graph Theory Measures and Their Application to Neurosurgical Eloquence. Cancers 2023

https://timbr.ai/community-detection-algorithm/

Degree — Measures the number of connections (edges) a node has in the network. Identifies highly connected nodes (hubs) that often represent critical molecules or interactions, such as essential proteins or highly expressed genes.
Path length — Average shortest path between all pairs of nodes. Indicates the signal flow across the network, relevant for instance in signaling pathways.
Shortest Path — The minimum number of edges required to traverse between two nodes. Essential for studying signal transduction, metabolic fluxes, and the efficiency of molecular or ecological interactions. Nodes with many shortest paths passing through them often have critical roles in the system.
Clustering Coefficient — Measures the tendency of nodes to form tightly knit groups. High clustering often signifies functional specialization, such as protein-protein interaction clusters in cellular compartments.

10 of 30

Some Topological Features

Centrality — set of metrics that determine the importance or influence of a node within a network. Different centrality measures highlight different aspects of importance based on the network’s structure

Betweenness Centrality: Reflects the frequency with which a node appears on the shortest paths between other nodes. Important for identifying key regulators or bottlenecks in pathways.
Closeness Centrality: Measures how close a node is to all other nodes, indicating its ability to quickly interact or influence others.
Degree Centrality: Identifies the most connected nodes, which may play pivotal roles in stability or robustness.
Eigenvector Centrality: Considers the influence of a node based on the importance of its neighbors, helping locate influential components in signaling or metabolic networks.

Community — clusters or groups of nodes within a network that are more densely connected to each other than to nodes outside the group. These clusters, or communities, are also known as modules or sub-networks

Graph Theory Measures and Their Application to Neurosurgical Eloquence. Cancers 2023

https://timbr.ai/community-detection-algorithm/

11 of 30

Graphs in Biology

https://towardsdatascience.com/umap-for-data-integration-50b5cfa4cdcd

http://snap.stanford.edu/deepnetbio-ismb/ipynb/Human+Disease+Network.html

https://cytoscape.org/cytoscape-tutorials/presentations/ppi-tools1-2017-mpi.html#/

https://en.wikipedia.org/wiki/Metabolic_network

https://www.scienceandfood.org/the-flavor-network/

12 of 30

Omics Networks

13 of 30

Types of Omics

Genomics Study of the genome, which includes all DNA within an organism

Sequence, structure, and function of genes
Key technology — Next-generation sequencing (NGS)

Transcriptomics Study of the transcriptome, which is the complete set of RNA transcripts

Gene expression and regulation
Key technology — RNA sequencing (RNA-seq)

Proteomics Study of the proteome, or the complete set of proteins in a cell or organism

Protein structure, function, interactions, and modifications
Key technology — Mass spectrometry (MS)

Metabolomics Study of the metabolome, which includes all small-molecule metabolites in a cell or biological system

Cellular processes and metabolic pathways
Key technology — Mass spectrometry (MS)

Metaomics Studies the collective genetic material, proteins, metabolites, and other molecular components from entire communities of organisms in a specific environment, without needing to isolate or culture individual species.

Genome

Transcriptome

Proteome

Metabolome

14 of 30

Omics Data

samples

genes

transcripts

proteins

metabolites

API

Web

Doc

Visualisation

Reporting

Analytics

15 of 30

Omics Data

samples

genes

transcripts

proteins

metabolites

Graphs

16 of 30

How to Build a Network

Data to Graph

Data sources

Correlation-based networks — constructed by calculating pairwise correlations between entities based on their expression profiles across multiple conditions, time points, or samples (Weighted gene co-expression network analysis (WGCNA), co-abundance networks)
Knowledge-base approaches — also called knowledge graphs and built by integrating heterogeneous data from multiple sources —> Knowledge Graphs

17 of 30

How to Build a Network

Starting point

proteins

samples

proteins

samples

correlation analysis

differential regulation analysis

proteins

Protein-protein Interaction network

functional enrichment

correlation network

functions

proteins

functional enrichment network

knowledge graph

18 of 30

Knowledge Graphs

19 of 30

What is a Knowledge Graph (KG)

Relationships firsts everything else second

A way to organise knowledge/information by defining associations or relationships
These relationships facilitate integration, management and enrichment of data
The objective when setting up a KG:

Standardisation / FAIRification
Reusability
Interpretability
Automation
Representation/Visualisation

The Knowledge Graph Cookbook. Andreas Blumauer and Helmut Nagy. 2020

20 of 30

Knowledge Graph vs Graph Database

https://snap.stanford.edu/graphlearning-workshop/slides/stanford_graph_learning_Biomedicine.pdf

21 of 30

Knowledge Graph vs Graph Database

https://snap.stanford.edu/graphlearning-workshop/slides/stanford_graph_learning_Biomedicine.pdf

Focus on data integration to represent complex biological systems and be able to reason over them

22 of 30

Building a Knowledge Graph

Define the questions you want to answer
Define what data can be used to answer these questions and how it is linked — Data model
Find where to get these data
Get the data, standardise it and format it
Generate the graph
Query the graph to answer the questions

23 of 30

Building a Knowledge Graph

1 and 2

24 of 30

Building a Knowledge Graph

1 and 2

Exercise

Create a data model that allows us to answer the question:

What drugs related to our disease of interest target some of the proteins identified in our experiment or relevant protein complexes and pathways?

25 of 30

Graph Databases

Knowledge Graphs became popular in 2012 thanks to Google (proprietary graphs)
What made them accessible was the development of open-source Graph Databases
Graph Databases are NoSQL databases that use graph structures to represent and store data
Data is represented as Nodes, Relationships and properties

They use their own querying languages: Cypher, SPARQL, GraphQL, Gremlin, etc.

https://en.wikipedia.org/wiki/Graph_database

node

relationship

node

property

26 of 30

Graph Data Models

Semantics vs Programmers

Semantic Graphs or Triple-stores

The network represents meaning through semantic relationships, which simplifies reasoning
Follows the Resource Description Framework (RDF) data model

Properties need to be represented as nodes
Allows n-array relationships —>

Uses Uniform Resource Identifiers (URIs) to identify concepts
Used for Ontologies —> cancer — is_a — disease
The query language used is SPARQL
Vendors (I know):

STARDOG (https://www.stardog.com/)
PoolParty (https://www.poolparty.biz/)

:item

subject —predicate — object

27 of 30

Graph Data Models

Semantics vs Programmers

Labelled Property Graphs (LPG)

The network represents relationships between pairs of nodes, and they have labels describing id, type, class and other properties
The number nodes is reduced by using properties instead
Does not allow n-array relationships, instead properties —>

The query language used is Cypher (not a standard but widely adopted)
Vendors (I know):

Neo4j (https://neo4j.com/)
TypeDB (https://vaticle.com/)
Memgraph (https://memgraph.com/)
FalkorDB (https://www.falkordb.com/)
NebulaGraph (https://www.nebula-graph.io/)

node

relationship

node

property

https://www.oxfordsemantic.tech/fundamentals/what-is-a-labeled-property-graph

from=librarian_id

28 of 30

Cypher Query Language

Cypher is a graph query language that provides a visual way of matching patterns and relationships

(property graphs)

https://neo4j.com/developer/cypher/

5 and 6

1 of 30

2 of 30

3 of 30

4 of 30

5 of 30

6 of 30

7 of 30

8 of 30

9 of 30

10 of 30

11 of 30

12 of 30

13 of 30

14 of 30

15 of 30

16 of 30

17 of 30

18 of 30

19 of 30

20 of 30

21 of 30

22 of 30

23 of 30

24 of 30

25 of 30

26 of 30

27 of 30

28 of 30

29 of 30

30 of 30