1 of 73

Generation and Application of �Biomedical Knowledge Graphs

PhD Colloquium of

Charles Tapley Hoyt

ORCID: 0000-0003-4423-4370

1

Presented December 3^rd, 2019 at the University of Bonn

Erstgutachter Prof. Dr. Martin Hofmann-Apitius

Zweitgutachter Prof. Dr. Andreas Weber

Fachnah Prof. Dr. Thomas Schultz

Fachfremd Prof. Dr. Diana Imhof

2 of 73

Knowledge Graphs for Storage and Integration

2

Scannell, J. W., et al. (2012). Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews Drug Discovery, 11(3), 191–200.

3 of 73

Knowledge Graphs for Storage and Integration

3

From https://nbviewer.jupyter.org/github/pybel/pybel-notebooks/blob/master/example_knowledge/Sialic%20Acid%20Subgraph.ipynb

4 of 73

Biomedical Knowledge Graphs (KGs)

Enable formalization of knowledge �(as triples)
Goals of biomedical KGs

Information retrieval, exploration, and visualization
Reason over experimental and clinical observations
Propose new experiments

Comparison of BEL

Provenance
Experimental and biological contextualization
Close to mechanistic

4

Systems Biology Markup Language (SBML)

Biological Pathways Exchange �(BioPAX)

Biological Expression Language �(BEL)

Resource Description Framework �(RDF)

Web Ontology Language �(OWL)

Different formalisms for KGs

5 of 73

Example BEL Statement

5

Type | Namespace | Name

decreases

bp(MESH:“Oxidative Stress”)

Predicate

Object

a(CHEBI:corticosteroid)

Subject

Identifier

6 of 73

Causal Modeling with BEL

6

Petri Nets ¹

Image from https://upload.wikimedia.org/wikipedia/commons/f/fe/Detailed_petri_net.png
Image from https://neurommsig.scai.fraunhofer.de/
Image from Lopez, C. F., et al. (2013). Programming biological models in Python using PySB. Molecular Systems Biology, 9(646), 646.

Influence Maps, Differential Equation Models ³

Causal Networks ² (e.g., BEL)

Higher Granularity

7 of 73

Goals of this Thesis

Generation. Improve methods for curation and semantic data integration to generate high granular biomedical knowledge graphs
Application. Develop novel methods for using prior biomedical knowledge to propose new biological hypotheses

Investigate the aetiology of disease (target prioritization)
Understand drugs' mechanisms of action (drug repositioning)

7

8 of 73

Part 1: Generation

8

9 of 73

9

10 of 73

Development of PyBEL

Previously curated neurodegenerative disease (NDD) KGs from NeuroMMSig
Lacked stable open source software to:

Check syntactic and semantic correctness
Write new algorithms
Re-implement previous algorithms
Analyze new data

10

Parser and

Validator

Network Data Structure

Data Converter

Database

External Data (BEL Script, etc.)

Visualize

Hoyt, C. T., et al (2017). PyBEL: a computational framework for Biological Expression Language. Bioinformatics, 34(4), 703–704.�Domingo-Fernández, D., et al. (2017). Multimodal mechanistic signatures for neurodegenerative diseases (NeuroMMSig): a web server for mechanism enrichment. Bioinformatics, 33(22), 3679–3681.

11 of 73

Curation in the Cloud

Technologies for checking �syntax/semantics

Version control system
Continuous integration

New workflow for achieving curator �agreement
Proof of concept: re-curated NDD KGs
Built Curation of Neurodegeneration Supporting Ontology (CONSO; unpublished)

11

Hoyt, C. T., et al. (2019). Re-curation and rational enrichment of knowledge graphs in Biological Expression Language. Database, 2019.

12 of 73

Manual Curation is Unsustainable

12

Lead curation of 353 full-text articles

Curation of Neurodegeneration in BEL (CONIB; unpublished)

There's Too Much Literature - almost doubles every ten years�(example: chemistry in PubMed)
Need automatic computer assistance

Natural language processing / Text mining

Reference: https://www.ncbi.nlm.nih.gov/pubmed/?term=chemistry accessed on 2019-11-03

13 of 73

Automatic Extraction with INDRA

INDRA (Integrated Network and Dynamical Reasoning Assembler)
Developed open source at Harvard
Combine several text mining systems
Runs on huge corpus
Calculates confidences for all triples

13

Gyori, B. M., et al. (2017). From word models to executable models of signaling networks using automated assembly. Molecular Systems Biology, 13(11), 954.

Text

14 of 73

Rational Curation

Large-scale reading with INDRA
Export as BEL for curation via PyBEL
Proof-of-concept: applied to enrich NDD KGs

14

Hoyt, C. T., et al. (2019). Re-curation and rational enrichment of knowledge graphs in Biological Expression Language. Database, 2019.

15 of 73

Integrating Biological Data Sources in Bio2BEL

Reproducible, automated, reliable acquisition and transformation of biological data sources to BEL

15

Hoyt, C. T., et al. (2019). Integration of Structured Biological Data Sources using Biological Expression Language. bioRxiv, 631812.

16 of 73

Integrating Biological Data Sources in Bio2BEL

16

Hoyt, C. T., et al. (2019). Integration of Structured Biological Data Sources using Biological Expression Language. bioRxiv, 631812.

... and getting bigger

17 of 73

Visualization and Exploration in BEL Commons

An environment for summarizing, exploring, and analyzing BEL KGs
Available on web at https://bel-commons.scai.fraunhofer.de

17

Hoyt, C. T., et al. (2018). BEL Commons: an environment for exploration and analysis of networks encoded in Biological Expression Language. Database, 2018(3), 1–11.

18 of 73

Part 2: Applications

18

What can we do with these knowledge graphs?

19 of 73

19

20 of 73

20

21 of 73

Drug Discovery and Development

Problems: long and expensive
Failure in Phase II from toxicity
Failure in Phase III from lack of efficacy

21

Scannell, J. W., Blanckley, A., Boldon, H., & Warrington, B. (2012). Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews Drug Discovery, 11(3), 191–200.

22 of 73

The Challenge�Unraveling the Triangle

22

Drug Discovery/

Drug Repositioning

Proteochemometrics

Target Prioritization

23 of 73

The Challenge�Unraveling the Triangle

23

Drug Discovery/

Drug Repositioning

Target Prioritization

Proteochemometrics

24 of 73

Network Representation Learning (NRL)

Generate dense embeddings for vertices in a KG
Main tasks:
Clustering / Downstream Machine Learning
Link Prediction
Entity Disambiguation

24

25 of 73

Intuition from the DeepWalk Algorithm

Generate random walks starting at each node
Train a Skip-Gram language model using �nodes as words and walks as sentences
Outputs vectors for each node for downstream machine learning
*GAT2VEC includes labels �(e.g., differential gene expression)

25

Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). DeepWalk: Online Learning of Social Representations. https://doi.org/10.1145/2623330.2623732

26 of 73

Emig's Approach to Target Prioritization

26

Emig, D., Ivliev, A., Pustovalova, O., Lancashire, L., Bureeva, S., Nikolsky, Y., & Bessarabova, M. (2013). Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach. PLoS ONE, 8(4).

Gene Features (manually selected)

Gene Labels

(Disease Associated Genes)

Logistic Regression Classifier

Protein-Protein Interaction Network

Differential Gene Expression

27 of 73

GuiltyTargets

27

Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. bioRxiv, 521161

Gene Features

LEARNED WITH NRL

Gene Labels

(Disease Associated Genes)

Logistic Regression Classifier

Protein-Protein Interaction Network

Differential Gene Expression

28 of 73

Emig vs. GuiltyTargets Evaluation (AUC-ROC)

28

Disease	Emig�(published)	Emig �(redo)	GuiltyTargets (ours)	GuiltyTargets (randomized)
Acute myeloid leukemia	.82	.82	.93	.50
Hepatocellular carcinoma	.80	.73	.94	.51
Idiopathic pulmonary fibrosis	.88	.55	.92	.50
Liver cirrhosis	.67	.55	.94	.51
Multiple sclerosis	.72	.76	.93	.50
Alzheimer's disease	-	.63	.94	.50
...	...	...	...	...

Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. bioRxiv, 521161

***Disclaimer: we have new results as of the week of this defense that are slightly different, but don't contradict the conclusions from this table

29 of 73

Limitations of GuiltyTargets

Built on homogenous PPI network

Could use more fine granular networks generated from Part 1

Limited novelty due to guilt-by-association
No edge weights
Single task machine learning

29

Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. bioRxiv, 521161

30 of 73

Side Effect Network (SEffNet)

KG with drugs, targets, diseases, and side effects
Investigated performance benefits of edge weights
Multi-task machine learning

30

Training Edges

LEARNED WITH NRL

Node and Edge Features

Testing Edges + Negative Sampling

Logistic Regression Classifier

Heterogeneous Network

Aldisi, R. ... & Hoyt, C.T. (2020) Applications of Network Representation Learning for Drug Repositioning in Parkinson's Disease. Manuscript in preparation

31 of 73

Side Effect Network (SEffNet)

31

UMLS	Name	LOR
C0006384	Bundle branch block	.000
C0575090	Balance disorder	.000
C0878544	Cardiomyopathy	.001
C0233794	Memory impairment	.001
C0004239	Atrial flutter	.001
C0160390	Liver injury	.001
C0020676	Hypothyroidism	.001
C0002884	Hypochromic anaemia	.001
C0034069	Pulmonary fibrosis	.001
C0233477	Dysphoria	.001

Aldisi, R. ... & Hoyt, C.T. (2020) Applications of Network Representation Learning for Drug Repositioning in Parkinson's Disease. Manuscript in preparation

32 of 73

Ongoing and Next Steps

Edge prediction in networks containing drugs, side-effects, and targets to identify drugs' mechanisms of action
Comparison of NRL to drug repositioning with engineered features from Hetionet [Himmelstein, et al. (2015)]
Multitask target prioritization with Hetionet

Incorporation of literals in knowledge graphs such as chemical fingerprints, differential gene expression values, clinical variables, etc.
Benchmark more learned representations versus engineered features
Extend approaches to clinical modalities (embed patients for predictive modeling)

32

Himmelstein, D. S., et al. (2017). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. ELife, 6.

33 of 73

Impact

Development of open-source BEL ecosystem

BEL compiler and tools (PyBEL)*
Integration tools (Bio2BEL)*
Interactive web application (BEL Commons)*

Curation

New curation strategy (re-curation workflow and new curation guidelines)
Semi-automated curation workflow with open source text mining tools

Applications

Network representation learning for hypothesis generation (GuiltyTargets)*
Comparative mechanism enrichment algorithm (EpiCom; not presented)
Automated simulation (BEL2ABM; not presented)

*Interest and ongoing adoption in both academia and industry

33

34 of 73

Code and Data Availability

PyBEL �https://github.com/pybel/pybel

Re-curation Workflow�https://github.com/pybel/pybel-git

Enrichment Workflow�https://github.com/bel-enrichment

CONSO�https://github.com/pharmacome/conso

CONIB�https://github.com/pharmacome/conib

34

Bio2BEL�https://github.com/bio2bel

BEL Commons�https://github.com/bel-commons

Guilty Targets https://github.com/guiltytargets

SEffNet�https://github.com/seffnet

35 of 73

Lessons Learned in Communication

We haven't cured any disease or elucidated unknown side effects! Presented were tools for generating hypotheses
Collaboration with experimentalists and clinicians is key
Overselling what's possible will decrease our ability to have impact (even if the market is lucrative)

35

36 of 73

Acknowledgements

Supervision

Prof. Dr. Martin Hofmann-Apitius
Prof. Dr. Holger Fröhlich

Committee

Prof. Dr. Andreas Weber
Prof. Dr. Diana Imhof
Prof. Dr. Thomas Schultz

Master's Students

Özlem Muslu, Rana Aldisi, Lingling Xu, Maurici Pio de Lacerda, Vinay Bharadhwaj

Coworkers

Fraunhofer SCAI.Bio
Fraunhofer SCAI.IT
Fraunhofer FIT
Fraunhofer IME

36

Collaborators

Scott Colby (Stanford)
Dr. Dexter Pratt (UCSD/Cytoscape)
Dr. John Bachman and Dr. Ben Gyori (Harvard)
Max Berrendorf (LMU Munich)
Laurent Vermue (Technical University of Denmark)
Dr. Denés Türei, Nicolàs Palacio-Escat, and Prof. Dr. Julio Saez-Rodriguez (University of Heidelberg/EMBL)

Special Thanks

Daniel Domingo Fernández
Mehdi Ali (he's a good guy)
Emperor André Gemünd

Family

Friends

PhD Crew, Girgit Crew, FFF Crew, ESN Bonn

37 of 73

Projects and Funding

AETIONOMY (IMI)�https://www.aetionomy.eu/

B-IT Foundation�http://www.b-it-center.de/

Cytoscape Consortium�https://cytoscape.org/

The Human Brain Pharmacome (Fraunhofer)�https://pharmacome.scai.fraunhofer.de/

Fraunhofer Center for Machine Learning�https://www.cit.fraunhofer.de/de/zentren/maschinelles-lernen.html

37

38 of 73

Ph.D. Publications

Hoyt, C. T., Konotopez, A., & Ebeling, C. (2017). PyBEL: a computational framework for Biological Expression Language. Bioinformatics (Oxford, England), 34(4), 703–704. https://doi.org/10.1093/bioinformatics/btx660
Hoyt, C. T., Domingo-Fernández, D., & Hofmann-Apitius, M. (2018). BEL Commons: an environment for exploration and analysis of networks encoded in Biological Expression Language. Database, 2018(3), 1–11. https://doi.org/10.1093/database/bay126
Hoyt, C. T., et al. (2019). Re-curation and rational enrichment of knowledge graphs in Biological Expression Language. Database, 2019(1). https://doi.org/10.1093/database/baz068
Hoyt, C. T., et al. (2019) Bio2BEL: Integration of Structured Knowledge Sources with Biological Expression Language. bioRxiv, 631812. https://doi.org/10.1101/631812
Gündel, M., Hoyt, C. T., & Hofmann-Apitius, M. (2018). BEL2ABM: Agent-based simulation of static models in Biological Expression Language. Bioinformatics (Oxford, England), 34(13), 2316–2318. https://doi.org/10.1093/bioinformatics/bty107
Hoyt, C. T., et al. (2018). A systematic approach for identifying shared mechanisms in epilepsy and its comorbidities. Database, 2018(1). https://doi.org/10.1093/database/bay050
Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. bioRxiv, 521161. https://doi.org/10.1101/521161

38

39 of 73

Other Publications

Bradford, R., Sturm, T., Weber, A., Davenport, J. H., England, M., Errami, H., Gerdt, V., Grigoriev, D., Hoyt, C. T., Košta, M., & Radulescu, O. (2017). A Case Study on the Parametric Occurrence of Multiple Steady States. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation - ISSAC ’17 (Vol. Part F1293, pp. 45–52). New York, New York, USA: ACM Press.
Domingo-Fernández, D., Hoyt, C. T., Alvarez, C. B., Marin-Llao, J., & Hofmann- Apitius, M. (2018). ComPath: an ecosystem for exploring, analyzing, and curating mappings across pathway databases. Npj Systems Biology and Applications, 5(1), 3.
Domingo-Fernández, D., Mubeen, S., Marín-Llaó, J., Hoyt, C. T., & Hofmann- Apitius, M. (2019). PathMe: merging and exploring mechanistic pathway knowledge. BMC Bioinformatics, 20(1), 243.
Ali, M., Hoyt, C. T., Domingo-Fernández, D., Lehmann, J., & Jabeen, H. (2019). BioKEEN: A library for learning and evaluating biological knowledge graph embeddings. Bioinformatics (Oxford, England).
Bradford, R., Davenport, J. H., England, M., Errami, H., Gerdt, V., Grigoriev, D., Hoyt, C. T., Kosta, M., Radulescu, O., Sturm, T., & Weber, A. (2019). Identifying the Parametric Occurrence of Multiple Steady States for some Biological Networks. arXiv, 1902.04882
Brito, E., Georgiev, B., Domingo-Fernández, D., Hoyt, C. T., & Bauckhage, C. (2019) RatVec: A General Approach for Low-dimensional Distributed Vector Representations via Domain-specific Rational Kernels. In Proceedings of ECML PKDD

39

Ali., M., Jabeen, H., Hoyt, C. T., & Lehmann, J. (2019) The KEEN Universe: An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability. In Proceedings of ISWC.
Ali., M., Domingo-Fernández, D., Hoyt, C. T., & Lehmann, J. (2019) Predicting Missing Links Using PyKEEN. In Proceedings of ISWC.
Karki, K., Kodamullil, A .T., Hoyt, C. T., & Hofmann-Apitius,M.(2019) Quantifying mechanisms in neurodegenerative diseases (NDDs) using candidate mechanism perturbation amplitude (CMPA) algorithm. BMC Bioinformatics, 20(1), 494..
Emon, M. A., Domingo-Fernández, D., Hoyt, C. T., & Hofmann-Apitius, M. (2019) PS4DR: a multimodal workflow for identification and prioritization of drugs based on pathway signatures. BMC Bioinformatics, submitted.
Mubeen, S., Hoyt, C. T., Gemünd, A., Hofmann-Apitius, M., Fröhlich, H., & Domingo-Fernández, D. (2019). The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling. Frontiers in Genetics.
Humayun, F., Domingo-Fernández, D., George, A. A. P., Hopp, M.-T., Syllwasschy, B. F., Detzel, M. S., Hoyt, C. T., Hofmann-Apitius, M., Imhof, D. (2019). A computational approach for mapping heme biology in the context of hemolytic disorders. bioRxiv, 804906.

40 of 73

Extras

40

41 of 73

Re-curation

41

42 of 73

Re-curation Workflow

(Team) Handle names

Normalize entities from custom namespaces that were never checked
Update all namespaces (only a small manual effort)

(Team) Checking correctness

High Confidence a statement can be asserted from the given evidence
Medium Confidence a statement is incorrect or incomplete. Update accordingly.
Low Confidence the meaning of a statement is unclear and should be discussed in a group
No Confidence the statement is not supported by the evidence and cannot be fixed. Delete.

(Leaders) Finalize curation

Read all statements with Medium and High confidence annotations and assign Very High where appropriate

42

43 of 73

Re-curation Results on NeuroMMSig

43

Significant effort by many people to assure BEL syntactic and semantic quality

Manual curation efforts added huge, high-quality biological novelty.

44 of 73

Manual Curation

44

45 of 73

CONSO

Curation of Neurodegeneration Supporting Ontology (CONSO)
Generated during re-curation of NeuroMMSig (and later full-text curation)
31 classes, 366 entities, 449 relationships, 1366 synonyms, and 271 cross references

45

https://pharmacome.github.io/conso/

46 of 73

CONIB

Curation of Neurodegeneration in BEL (CONIB)
Mechanisms underpinning tauopathies including:

Tau modification and hyper-modification
Tau aggregation
Proteostasis
Nicotinic receptor signalling

Related diseases and indications �(ALS, MS, Huntington's, etc.)

46

Curators	10
Full Text Articles	353
Time (in Months)	24
Authors	1969
Nodes	5862
Edges	20860

47 of 73

Text Mining

47

48 of 73

Biology Has Intricate Relationships

Types

Causal
Correlative
Associative
Ontological

Modes

Activities
Abundances
Efflux

Directionality

Uni-directional
Bi-directional (reflexive)

Polarity

Increase
Decrease
Unknown
None

Contact

Direct
Indirect

States

Experimental context
Subcellular location
PTMs
Fusions
Mutations
Pre/post-conditions

48

49 of 73

Relation Extraction Methods

Manual

Curation at highest granularity

Automatic

Rule-based

REACH (REading and Assembling Contextual and Holistic mechanisms from text)

Natural Language Processing (NLP)

TRIPS

Machine Learning / Deep Learning

Turko Event Extraction System (TEES)
BELIEF (BEL Information Extraction Workflow)

Semi-automatic

As good as automatic methods get, manual curation will continue to be necessary

49

50 of 73

Rule-based Extraction in REACH

50

Valenzuela-Escárcega, et al. Large-scale automated machine reading discovers new cancer-driving mechanisms. Database (2018) Vol. 2018: article ID bay098; doi:10.1093/database/bay098

51 of 73

Rational Enrichment

51

52 of 73

Priority Subgraphs

Discussion with Fraunhofer IME and Hugo Geerts (in Silico Biosciences) lead to a prioritization of signatures for re-curation

Criteria

Novelty
Druggability
Assay-ability
Expert resource availability and advise

Top 4 Subgraphs

Tau protein subgraph
Inflammatory response subgraph
Insulin signal transduction
GSK3 subgraph & DKK1 subgraph

52

53 of 73

BEL Commons

53

54 of 73

54

55 of 73

55

56 of 73

56

57 of 73

57

58 of 73

Emig's Approach

58

59 of 73

Emig's Data Sources

Protein-protein interaction (PPI) networks

HIPPIE
STRING

Disease-specific differential gene expression profiles

GEO
ArrayExpress

Fingerprinting genes

Local (neighborhood scoring, interconnectivity)
Global (random walk, network propagation)

Disease-target associations

Integrity (http://integrity.thomson-pharma.com)
Therapeutic Target Database

Positive-unlabelled learning

Logistic regression
Cross validation

59

Emig, D., Ivliev, A., Pustovalova, O., Lancashire, L., Bureeva, S., Nikolsky, Y., & Bessarabova, M. (2013). Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach. PLoS ONE, 8(4).

60 of 73

Emig's Local Network Features

Interconnectivity

Neighborhood Scoring

60

Reference: Emig, D., Ivliev, A., Pustovalova, O., Lancashire, L., Bureeva, S., Nikolsky, Y., & Bessarabova, M. (2013). Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach. PLoS ONE, 8(4). https://doi.org/10.1371/journal.pone.0060618

61 of 73

Emig's Global Network Features

Random Walks and Network Propagation

61

Reference: Emig, D., Ivliev, A., Pustovalova, O., Lancashire, L., Bureeva, S., Nikolsky, Y., & Bessarabova, M. (2013). Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach. PLoS ONE, 8(4). https://doi.org/10.1371/journal.pone.0060618

62 of 73

GuiltyTargets

62

63 of 73

Natural Language Model: Skip-Gram

A language model that maximizes the co-occurrence probability of words in the same window

63

Mikolov, T., Corrado, G., Chen, K., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space, 1–12.

64 of 73

GAT2VEC

Extends DeepWalk to handle nodes' labels

64

Sheikh, N., Kefato, Z., & Montresor, A. (2018). Gat2Vec: Representation Learning for Attributed Graphs. Computing, 1–23. https://doi.org/10.1007/s00607-018-0622-9

65 of 73

GAT2VEC Algorithm

Given: an attributed graph G = (V, E, F), attributes A, and attribution function F: V ￫ 2^A

Append structural edges {(v, a) | v ∈ V, a ∈ F(v)} to G
Generate γ random walks starting at each v ∈ V
Filter all a ∈ A from the walks
Train a Skip-Gram model as previously in DeepWalk

65

Sheikh, N., Kefato, Z., & Montresor, A. (2018). Gat2Vec: Representation Learning for Attributed Graphs. Computing, 1–23. https://doi.org/10.1007/s00607-018-0622-9

66 of 73

GuiltyTargets Workflow

66

Random walks and GAT2VEC

Disease-specific differential gene expression

Emig, D., et al. (2013). Drug Target Prediction and Repositioning Using an Integrated Network-Based Approach. PLoS ONE, 8(4).

Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. bioRxiv, 521161

67 of 73

GuiltyTargets Predictions for AD

67

Symbol	Protein Type/Class	Score
CHRNB4	Nicotinic acetylcholine receptor subunit	.700
ITPR1	IP3 receptor	.689
GLRA2	Ligand-gated chloride channel subunit	.619
COMT	Catechol-O-methyltransferase	.587
GRIK2	Ionotropic glutamate receptor subunit	.587
CHRM4	Muscarinic acetylcholine receptor	.586
CHRFAM7A	Nicotinic acetylcholine receptor	.557
HTR7	Serotonin receptor 7	.532
KCNK3	Potassium K+ channels	.523
....	...	...

Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. bioRxiv, 1–14. 521161

68 of 73

SEffNet

68

69 of 73

Learning on SEffNet

Node Embeddings

DeepWalk
Node2Vec
LINE
SDNE
HOPE
GraREP

Edge Embeddings

Concatenation
Hadamard

Downstream Task

Edge classification with logistic regression
Cross validation schema

69

70 of 73

Node2vec

Return parameter (p) �the probability of revisiting nodes in a walk

In/out parameter (q) �the probability of visiting nodes connected to the previous node

70

Reference: Grover, A., & Leskovec, J. (2016). Node2Vec: Scalable Feature Learning for Networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855–864). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939754

71 of 73

Edge2vec

Calculate edge-edge conditional probabilities
Weight walks with a transition probability matrix
Encodes heterogeneous edges

71

Reference: Gao, Z., Fu, G., Ouyang, C., Tsutsui, S., Liu, X., Yang, J., … Ding, Y. (2018). edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. Retrieved from http://arxiv.org/abs/1809.02269

72 of 73

72

Method	Parameters	Value	Mean AUC-ROC	Mean AUC-PR	Mean MCC
node2vec	Dimensions Walk length Number of walks Window size Return parameter (p) In/out parameter (q)	300 8 8 4 2.3 1.9	.977	.981	.877
DeepWalk	Dimensions Walk length Number of walks Window size	300 8 8 2	.969	.974	.846
HOPE	Dimensions	300	.937	.962	.842
GraRep	Dimensions k-step	300 3	.977	.981	.866
LINE	Dimensions Proximity order Epochs	300 3 5	.979	.983	.867
SDNE	Proximity balance (a) Reconstruction weight (b) Epochs	0.128 14 25	.927	.949	.648

73 of 73

Robustness of SEffNet

73