1 of 14

STAGE Architecture Overview and User Narrative Breakout Prep

  • We have two core user narratives for WP2 but many, many for WP3
  • What’s lost in WP3’s user narratives is an architectural view of the system we’re building as a whole. We each see a piece of the overall system but, perhaps, not the whole.
  • Can we share the individual team’s vision of the architecture of DataSTAGE and, as a result, better understand what we’re collectively building together?

2 of 14

Proposed Activity

  1. [Before F2F] Break into Element teams
  2. [Before F2F] Each team draws out their understanding of the architecture of the system (what components, which group builds those components, which components “talk”) by Oct 2019
  3. We have each team walk through their architectural diagram
    1. C, Ca, He, Xe
  4. Finally, we create a consensus diagram that shows the components from various groups and how they relate

3 of 14

Brainstorming

Let’s brainstorm in this shared document

https://bit.ly/2ERUDye

No wrong answers, let’s collect ideas.�

Goal: how systems relate to each other �(not the details inside your system).

4 of 14

Xenon

Project Management

User Management

Authentication & Authorization

System Monitoring

Usage Logging

Notification Service

Backup Service

Billing Management

Web Application

API

Task Execution API

Data/Metadata Service

Cloud Storage & Compute

Resource Manager

Core Platform Infrastructure

Data Infrastructure

Independent Core Services

Task Execution Infrastructure

Task Scheduler

Job Management Layer

Orchestration Layer

5 of 14

FAIR4CURES

6 of 14

DataSTAGE

Powered by Seven Bridges

7 of 14

8 of 14

PIC-SURE API

Part 1) Phenotypic data preparation (before an investigator logs in)

Part 2) Phenotypic query in real time by an investigator across platforms

decrypt

TOPMED Data Coordinating Center�Harmonization process

platform

platform

Real time synonym search

Carbon

9 of 14

tranSMART

PIC-SURE �User Interface

i2b2 Core

PIC-SURE �Auth Micro-app

PIC-SURE �API V2

Fractalis

PIC-SURE HPDS

User Interfaces

Backend Services

Datastores

AuthN / AuthZ

Monitoring

AWS Account

Data Flow

Auth Flow

ETL Client

A) feasibility queries & cohort builder

B) Exploration

to generate hypothesis

C) Analysis

DataStore

Relational DB

Oracle + MySQL

HPDS

All logs ingested by Splunk

i2b2/tranSMART

platform

Carbon

10 of 14

DataStore

Relational DB

MySQL

HPDS

ETL Client i2b2/TM 18.1

Data Types

Clinical

Registries

Exome

Genome

i2b2/tranSMART

platform

Carbon

11 of 14

Ca+ Architecture

TOPMED WORKFLOWS

TERRA WORKSPACE

CROMWELL WORKFLOW ENGINE

JUPYTER INTERACTIVE ANALYSIS

WINDMILL DATA EXPLORER

U CHICAGO�Indexd

ORIGINAL METADATA

Metadata

harmonization across TOPMed datasets

Load by reference with GUIDs

ORIGINAL DATA FILES

DOCKSTORE TOOL REPOSITORY

Workflows via TRS

Metadata

via BDBag

Data files via DOS

Workflows

relevant to the

TOPMed community

AUTH

AUTH

AUTH

12 of 14

13 of 14

14 of 14

Next Steps

  • Diagram which component talks to which
  • Longer term views
    • Diagram that shows the ultimate destination… STAGE
    • Next one
  • User flows
    • Multiple flows
    • Not every component in every flow
    • Turn in to tutorials ultimately
  • Data flows
    • How data flows through the system
  • Apps and deployment process into ATO’d systems
  • Show security boundaries and how components are deployed