1 of 14

Data-driven Drug Discovery for TB (D4TB)

2nd BD2K / 4th Network of BioThings Hackathon 2015

2 of 14

Motivation Problem Statement

3 of 14

Challenges - Amount of Data

  • Biological data is increasing exponentially and difficult to cope up with.

4 of 14

Challenges - Data Analysis

  • Literature on TB has increased to 75% over last 10 years

5 of 14

Challenges - Disparate Datasets

  • Disparate and Heterogeneous data exists that needs to communicate

6 of 14

Objectives

  • Finding alternate/adjunct TB drug therapy
  • Integrate datasets to assist in TB drug discovery
  • Provide an interface for biologists to interact with the data

7 of 14

Overall Approach

Data

Integration

RDF/SPARQL

Results1

Results 2

Possible

Drug Candidates

TB Drugs

Chemical Similarity

Side Effect Similarity

Approach 1

Approach 2

8 of 14

Approach - 1 - Linked Data driven discovery

9 of 14

Approach 1 - Results

Drug Name

Disease

Group

Clavulanate

pneumonia

approved

Clavulanate

diarrhea

approved

Isoniazid

liver disease

approved

Isoniazid

hepatitis

approved

...

...

...

Disease

Drugs

pneumonia

Bacampicillin

Cefprozil

Cefdinir

diarrhea

Chloramphenicol

Isopropamide

liver disease

Telaprevir

Icatibant

...

...

Query 1 - Find associated diseases for TB drugs

Query 2 - Find drugs associated with diseases from Query 1

10 of 14

Approach 2 - Approach

Possible Candidates

TB Drugs

23 known TB Drugs

220,116 drug pairs

7890 drug pairs

4707 drug pairs

1109 drug pairs

19 drug pairs

SIDER side effect similarity

MACCS mol fingerprint similarity

Results2

SideEffect Similarity Score (<0.3)

Chemical Similarity (>0.75)

Results 1

11 of 14

Overall Results

TB Drugs

Possible Candidates

DrugBank IDs

amikacin

tobramycin

DB00479_DB00684

amikacin

framycetin

DB00452_DB00479

gatifloxacin

norfloxacin

DB01059_DB01044

rifabutin

anidulafungin

DB00362_DB00615

rifabutin

indinavir

DB00224_DB00615

linezolid

capecitabine

DB00601_DB01101

12 of 14

Demo

13 of 14

Future directions

  • Convert and integrate more biologically relevant datasets
    • Clinical Trials, Genomics and Proteomics Data
  • Improve user interface for biologists to explore the data
  • Continue collaborations between the groups

14 of 14

Thank you !

Questions?