1 of 90

GPC DevTeams�Tools and Techniques

HackathonSix Dec 2018�HackathonFive Feb 2018�HackathonFour January 2017

2 of 90

DevTeams at 12 Sites

  1. The University of Kansas Medical Center (KUMC)
  2. Indiana / Regenstrief (IU)
  3. University of Iowa Healthcare (UIOWA)
  4. The Medical College of Wisconsin (MCW)
  5. Marshfield Clinic (Wisconsin) (MCRF)
  6. The University of Missouri (MU)
  7. The University of Nebraska Medical Center (UNMC)
  8. The University of Texas Health Sciences Center at San Antonio (UTHSCSA)
  9. The University of Texas Southwestern Medical Center (UTSW)
  10. The University of Utah (UTAH)
  11. Allina Health
  12. Intermountain Health Care

See also: DevTeams

3 of 90

Tell us about you and your site

  • Names, Titles / Roles
  • Clinical Data Sources, Population
  • ETL Tools, Processes
    • To i2b2
    • To PCORNet (SAS datasets, SAS views)
    • Other? OMOP, REDCap, ...
    • Tools: RDBMS Version? SAS? Jenkins? Docker?
    • Hardware Sizing
  • Strengths of your approach

4 of 90

KUMC

  • People
    • PI: Russ Waitman, Director of Medical Informatics
    • Engineering Team Lead: Imran Hafeez
    • Software/Systems Engineers: Dan Connolly, Lav Patel, Daryl Budine
    • Investigators, Analysts, Honest Brokers: Sravani Chandaka, Maren Wennberg, Xing Song, Mei Liu
    • Project Management: Hillary Weedman, Steve Fennel, Brooklyn Winkel

5 of 90

HERON Architecture

6 of 90

KUMC: ordinary stuff

  • Bulk of data comes from Epic @ KU Hospital
    • Demographics, Diagnoses, Procedures, Lab results, Medications on ~500K patients
  • GE IDX clinic billing system
    • Phased out in favor of Epic resolute billing
  • NAACCR Tumor Registry: ~60K cases
  • Linux, python, R, Jenkins, mercurial, git
  • Oracle 12c on SuSE, REDHat Linux
  • SAS 9.x on Windows, Linux

Docker: experimenting

7 of 90

KUMC Strengths

  • Self-service: ~1000 queries/month by ~40ppl
  • REDCap for oversight workflow, data delivery
  • i2b2 is the data warehouse
    • Flowsheets (1B+ facts)�Microbiology, Alerts, …
    • UHC
    • Social Security Death Master File, NCDR, NTDS, …

60+ monthly releases going back to 2011

Share and Enjoy: kumc-bmi/heron on github

HardwareSizing: 2.2TB FusionIO solid state storage

8 of 90

KUMC terms�on Babel

9 of 90

KUMC: i2p-transform to CDM

  • Prototype: gpc-pcornet-cdm early 2015
  • Adapted as SCILHS/i2p-transform Dec 2016
  • kumc-bmi/i2p-transform
    • forked from SCILHS/i2p-transform Feb 2016
      • TODO: sync with ~70 upstream commits
      • TODO: migrate pre/post-processing kludges back into HERON ETL
  • multi-day run-time :-/

10 of 90

Intermountain Team

  • Principal Investigator: Kirk Knowlton, MD
  • Co-Investigator: Benjamin Horne, PhD
  • Stakeholder Engagement: Samuel Brown, MD
  • Project Manager: Gary Bishop
  • Patient Representative: Chris Benda
  • Informatics Support:
    • Peter Haug, MD
    • Susan Rea, PhD
    • Justin Mundt, MS
    • Bart Dodds

11 of 90

Intermountain Healthcare

  • Not for Profit, Integrated Healthcare Organization
  • Twenty-three Hospitals
  • A Medical Group with more than 1,600 Physicians� and Advanced Practice Clinicians
  • Approximately 180 Clinics
  • A Health Plans division (Select Health)
  • Cerner EHR
  • Internally Developed Electronic Data Warehouse

Our Mission: Helping people live the healthiest lives possible®�

12 of 90

10/11/2018

Intermountain Patients by ZipCode 2006 – Present

13 of 90

EDW → AHR → CDM

  • Intermountain supports an Enterprise Data Warehouse (EDW) containing clinical and business data.
  • A cleansed and optimized extract called the Analytic Health Repository (AHR) was created for research beginning in 2009.
  • Our PCORnet CDM is largely an extract from the AHR.

14 of 90

EDW: A Source of Aggregated Data

Data Sources

Finance

Internal

State/Federal

External

Lab

Claims

Pharmacy

EMR

Data Access

EDW

Primary Care

Data Mart

Rx

Data Mart

Other

Data Marts

CV

Data Mart

Case Mix

Data Mart

Claims

Data Mart

Analytic Processes

Direct SQL

Desktop ODBC tools

Web applications

Statistical Analysis

Reporting Tools

OLAP Tools

OTHERS

Data Mining

15 of 90

Using EDW Data

EDW

Primary Care

Data Mart

Rx

Data Mart

Other

Data Marts

CV

Data Mart

Case Mix

Data Mart

Claims

Data Mart

Financial

Analysis

Mandated

Reporting

Quality

Management

Business

Planning

Clinical

Research

Planning

Research

Data Analysis

Genetic

Epidemiology

Etc.

Data Mining/

Machine Learning

Process

Tracking

16 of 90

Analytic Health Repository

L3 – Analytic

  • Clinical constructs (Registries, Scores, Quality measures)
  • Population-based views of disease progression, predictors & outcomes
  • Flexible, transparent and consistent
  • Gateway into Specialized Population-based Views

L2 – Building Blocks

  • Sources integrated and terminology standardized
  • Cleansed and Quality enforced
  • Exceptions reported
  • Complete data sets over time

L1 – Enterprise Data Warehouse

  • Contains source system data
  • Integrated using Data Bus Architecture
  • Common data attributes with standard names and data types

Enterprise ETL

Modeling Engines

Concepts based on models. These represent high level clinical or business categories that are implemented using a variety of knowledge-based tools (rule engines, ML, ontologies, etc.).

The base data extracted directly from original data sources. Individual tables incomplete, using original coding and structure, with errors unaddressed.

Data restructured and optimized for clinically relevant searching. Carefully screened for errors, converted to national and international coding standards, complete for relevant data over time.

17 of 90

Analytic Health Repository -> CDM Database

L3 – Analytic

  • Clinical constructs (Registries, Scores, Quality measures)
  • Population-based views of disease progression, predictors & outcomes
  • Flexible, transparent and consistent
  • Gateway into Specialized Population-based Views

L2 – Building Blocks

  • Sources integrated and terminology standardized
  • Cleansed and Quality enforced
  • Exceptions reported
  • Complete data sets over time

L1 – Enterprise Data Warehouse

  • Contains source system data
  • Integrated using Data Bus Architecture
  • Common data attributes with standard names and data types

Enterprise ETL

Modeling Engines

PCORnet CDM

Database

18 of 90

10/11/2018

Number of Patients

Demographic Table: 2,830,321

Care Prvdr. Visits, Jul 17-Jun18: 1,469,350

Intermountain Healthcare CDM Vital Statistics

Care Facilities in Utah, S. Idaho

Hospitals 23

Clinics 180

Diagnostic Services, Home Care, Rehab,

Skilled Nursing, Telehealth

Primary Children Medical Center

Number of Care Providers

Employed MD, PA, APRN, other Prof.: 2300

Affiliated practitioners: 3500

Programs of Interest

Select Health Plans claims ~25% patients

Tissue biorepository:

specimens > 30 yrs, ~5 million

Linkages to Utah Population Data Base

CV Family History data base

Active Inmtn Ca Registry

19 of 90

IU / Regenstrief

Team:

PI-...Tim Imler (Umberto T.)

Engineering Team-Tony French, Jeff Stroup

Data Analyst-Ross Hayden

Project Manager-Dan Hood

20 of 90

IU/Regenstrief

Data Sources

-Data fed from Indiana Network for Patient Care (INPC, Indiana Health Information Exchange). Specifically, IU Health and Eskenazi Hospital systems.

-Certain GPC/PCORI deliverables have required data pulls directly from IU Health database

-NAACCR Tumor Registry

-Supplemental Px’s and Dx’s

21 of 90

IU/Regenstrief

ETL Tools and Processes

-Oracle Views for De-identification

-Oracle chains, stored procedures and views for ETL from RMRS → I2B2

-SCILHS i2p-transform for i2b2 → PCORNET CDM V3

-SAS Datasets for PCORNet (Failed attempt at using Oracle Views)

Tools:

Oracle 11g, SAS 9.4, RHEL 6/7, Java, Atlassian Tool Suite, Git,

Docker (investigative), Ansible (investigative)

22 of 90

IU/Regenstrief

23 of 90

GPC partners

  • Leadership: Department of Epidemiology, College of Public Health
  • Clinical: University of Iowa Health Care
  • Informatics: Institute for Clinical and Translational Science (ICTS)

24 of 90

People

  • Betsy Chrischilles Principal Investigator
  • Ryan Carnahan Investigator
  • Mary Schroeder Investigator
  • Michael Wright Dev Lead
  • Gi-Yung Ryu Data Architect
  • Lucas Van Tol Technical Support
  • Brad McDowell Research Specialist
  • Brian Gryzlak Project Manager
  • Boyd Knosp GPC DROC Representative
  • Ashlee Wilson GPC DROC Representative (alt)

25 of 90

UIOWA strengths

  • Clinical Research Data Warehouse (CRDW) provides a normalized foundation to:
    • build other architectures (e.g. i2b2, CDM);
    • increase efficiencies in fulfilling data requests (e.g. TriNetX, PCORnet); and
    • feed custom datamarts (e.g. ORIEN, mother-child datamart, cancer datamart)
  • Prospective data collection
    • REDCap use rate is high, including MyCap
    • TeleForm for creating machine-readable forms and databases to contain the data
    • Iowa Personal Health Record for online consenting, collecting patient-reported data and research management. Semi-

automated interface with CRDW, REDCap, TeleForm,

statewide cancer registry

26 of 90

Data flow supporting GPC/PCORnet requests

27 of 90

Data environment

  • MS SQL Server shop
  • Clinical Research Data Warehouse (CRDW) DB Server
    • Serves as i2b2 staging
    • Hardware: Dell R720xd
    • Processor: 2 x 6 Xeon E5-2640 (12 cores total, 24 with hyperthreading)
    • Memory: 32GB RAM
    • Storage: ~20TB (mix of RAID1/5/6)
  • i2b2 Production DB Server
    • Same as CRDW but ~5TB of storage

28 of 90

Marshfield - People

  • PI: Jeff VanWormer, PhD
    • Robert Greenlee, PhD (Co-I)
  • PM: Judith Hase
  • Technical leads: Steff Roush, Lynda Kubacki-Meyer
  • Programmers
    • Steff Roush - i2b2 ETL, data mapping, QA, SAS
    • Eric LaRose - i2b2 ETL, CDM transform, SQL
    • Erica Scotty - PCORnet queries, SAS

29 of 90

Marshfield - Data Sources, etc.

  • Internally developed EHR - 20+ years capture;
    • Also: insurance plan, hospital (Cerner, Soarian), dental clinics, registries
  • Patient population: ~2.5M historic; ~400k unique/year
    • Central and Northern Wisconsin
  • Research data delivered as “expert mediated” model
    • 12 Research Programmer/Analysts
    • Netezza Data Warehouse, SAS 9.4
  • Software development stack is Microsoft
    • Windows, MS SQL server, C#, Azure DevOps

30 of 90

Marshfield - ETL Tools, Processes

  • Data Warehouse → Combined Tables
    • Process to de-duplicate sources
    • Netezza, SAS
    • Timing: Weekly
  • Combined Tables + supplemental data → i2b2
    • Netezza, SAS, C#; mapping IDs, formatting
    • Timing: Every 1-2 months, or as needed
  • i2b2 [SCILHS i2p-transform] → PCORnet CDM
    • MS SQL; mapping variables, formatting
    • Timing: After i2b2 build, based on Data Characterization schedule
    • *Limited Datasets* - date-shift on demand model
    • Considering moving away from SCILHS for CDM builds

31 of 90

Marshfield - i2b2 SQL Server Specs

Software: SQL Server 2012 SP1 (v11.0.3000.0)

RAM: 8.0 GB (Max SQL Server memory: 6.144 MB)

CPU: 4 Cores (Intel Xeon CPU E5-2690 v3 @ 2.60 GHz)

Disk:

  • C: (50 GB) – OS Only
  • D: (1.5 TB) – Data drive 1
  • E: (60 GB) – Log file location
  • F: (800 GB) – Data drive 2
  • G: (700 GB) – Data drive 3
  • H: (1 TB) – Data drive 4
  • I: (300 GB) - Data drive (tempDB only)
  • J: (700 GB) - Data drive 5

Notes:

  • We run two instances of i2b2 (live and down), which is why we have multiple data drives.
  • Extra space is needed for data loads and building indexes
    • Without a load in progress, free disk space: 39%, 100%, 13%, 8%, 12%, 100%, 16% (drives D-J).
    • At times during a load, the log files need to be extended over to some of the data drives

32 of 90

UTHSCSA

  • The Team
    • Dr. Meredith Zozus (NEW -- YAY!!!) - Division Chief - PI
    • Alex Bokov - Faculty
    • Olivia Ellsmore (Suarez) - IRB Liaison and Project Manager
    • Laura Manuel - Software Engineer- Senior- Lead Programmer
    • Eric Moffett- App Sys Prog Sys Analyst - Associate - Programmer
    • *Coming soon* - Honest Broker

  • Informatics Leads for Family Weight and Health Survey (Obesity)

33 of 90

UTHSCSA

  • Clinical Data Sources
    • UT-Med Clinic Data
    • UHS Hospital/Clinic Data
  • ETL Tools, Processes
    • Inhouse code Integrate UHS Sunrise (Allscripts) and UTMed- Clarity to I2B2.
      • Coming soon. UHS Clarity
    • Inhouse code -> Clarity/Sunrise to CDM

34 of 90

UTHSCSA Server Specs

Replacing SQL Server and CRC soon.

Further upgrades to New_stage also in the mix.

<3 new_stage

35 of 90

University of Nebraska Medical Center

Clinical partner, Nebraska Medicine.

-Two hospitals and multiple clinics

-Level I trauma center

-Newly opened Buffet Cancer Center.

36 of 90

37 of 90

UNMC People

  • PI: James McClay, MD
  • Project Manager: Carol Geary, PhD
  • Informatics Standards: James Campbell, Scott Campbell
  • ETL programming: Yeshwanth Narayana, Jay Pedersen
  • Architecture & System Design: Ashok Mudgapalli, Research Information Technology Office (RITO) Director
  • RITO Technical Staff

38 of 90

39 of 90

UNMC Architecture

  • Epic -> Epic Clarity-> Research copy of Epic Clarity (research clarity).
  • Research Clarity is MS SQL
  • Utilize Oracle Gateway to interface between MS SQL and Oracle
  • I2b2 instances: Linux, Oracle 11g
  • Postgis on Postgres

40 of 90

UNMC Strengths

  • Excellent relations with Clinical Partner
  • IDeA Clinical Translational Research Site
  • Vocabulary Standards
  • Encoding anatomic pathology and microbiology into i2b2

41 of 90

Developing ONC Standards Metadata for i2b2 Interoperability

  • Collaborating with NLM to develop historically complete metadata for RXNORM/NDC
  • Coordinating extension to SCILHS metadata (labs,meds,diagnoses) with Partners to include GPC partner labs and meds
  • Collaborating with Ontologies workgroup Transmart i2b2 foundation

42 of 90

Extending LOINC and SNOMED CT for Precision Medicine

  • Developing description-logic supported ontology for lab/pathology/clinical observation results (LOINC and SNOMED CT observables)
  • Developing results reporting for molecular pathology and genomics
  • Developing HL7 V2 interface from pathology/sequencing labs to Epic for real-time cancer data

43 of 90

UTSW People

  • PI: Lindsay Cowell, Associate Professor, Division of Informatics

  • Technical Team: Phillip Reeder, Jennifer Cai

  • Honest Brokers: Teresa Bosler, Shiby Antony

  • Project Management: Shiby Antony

44 of 90

UTSW Architecture

45 of 90

UTSW Data Sources, Tools

Data Sources

  • Bulk of data comes from UTSW EPIC (Clarity)
    • Demographics, Diagnoses, Procedures, Lab results, Medications ~1.3B observations on ~5.5M patients
  • Billing System (Siemens, Epic Resolute, IDX [data prior to 2009])
  • Volunteer Patient Registry
  • NAACCR Tumor Registry: ~60K cases

Tools

  • Linux
  • Oracle 11g
  • SAS 9.4 on Windows

46 of 90

UTSW i2b2 Terminology

47 of 90

UTSW Strengths

  • UT Southwestern Clinical Research Data Warehouse: access to standardized research data in addition to the i2b2: Bio-specimen data, Volunteer Patient Registry data, and Clinical Research Study data.
  • REDCap for data requests, delivery.
  • Import of clinical genomic data into i2b2.
  • Cancer gene connect - addition of sequencing data to i2b2 using Sequence Ontology.
  • REDCap FHIR integration with EPIC.

48 of 90

University of Missouri (MU)

49 of 90

MU Team

Abu Mosa, PhD

Primary Investigator

Honest Broker

Vasanthi Mandhadi

Project Manager

Honest Broker

Informatics Team

Todd McNeeley

Informatics Team

Marshall Gorski

Informatics Team

Cory Gassner

Informatics Team

Kamruz Zaman Rana

Informatics Team

Noelle Al-Khashti

Informatics Team

Jeff Ordway

Patient Advisor

Lynne Lawrence

Patient Engagement Officer

Lori Wilcox

IRB Representative

Jenelle Greaning

IRB Representative

William Stephens

Patient Advisor

50 of 90

MU Data Sources

  • Cerner Millennium EMR
  • IDX Billing
  • Cerner PathNet
  • Cerner PharmNet
  • Cancer Registry - Ellis Fischel Cancer Center = ~11K
  • SSDMF
  • ACS
  • Total # of Patients = ~635,245

51 of 90

MU i2b2/CDM ETL

52 of 90

53 of 90

MU Tools

  • SAS 9.4
  • MS SQL
  • Pentaho Data Integration (ETL)
    • Java
    • ECMAScript
  • SSIS (ETL)
  • ActiveBatch - cross platform job scheduling
  • REDCap - approved for storing identified data

54 of 90

MU Hardware

  • KC (Cerner)
    • Oracle on Linux
      • 24 core
      • 192 Gb RAM
  • MU
    • SAS on Windows Server (MU SOM)
      • 12 core
      • 64 Gb RAM
    • SAS on Linux Server (MU research computing)
      • HPC3
      • 4 nodes
      • 96 cores
    • Sql Server on Windows (Tiger Institute)
      • 16 core
      • 256 Gb RAM
      • 8Tb storage

55 of 90

MU Strengths

  • Large established user-base leveraging i2b2 for research
  • MU-iCATS - partner of WashU CTSA
  • A newly established Center for Biomedical Informatics (CBMI)
  • MU Center for Patient Center Outcomes Research (AHRQ Funded)
  • Recurring workshops on the use of informatics tools and applications
  • Centralized and integrated data governance for research data requests and research data brokers
  • Strategic support for i2b2 at the EMR vendor (Cerner)
  • Tiger Institute for Health Innovation - joint MU/Cerner partnership including a research focus
  • Piloting de-identification and feature extraction tasks using NLP pipeline on pathology and radiology reports
  • Collaboration with MU EECS department for automation of research data request and governance process using AI Chatbot and blockchain

56 of 90

MU Areas to Improve

  • Leverage GPC for funding opportunities
  • Free-text notes not available in i2b2
  • Resolving investigative checks in EDC

57 of 90

Medical College of Wisconsin

58 of 90

Medical College of Wisconsin

  • Main academic campus & practice in Milwaukee
  • 2 satellite academic campuses in northern Wisconsin
  • Provides professional services to Froedtert Health System and Children’s Hospital of Wisconsin
  • Staffs 5 hospitals and dozens of clinics statewide
  • Provides primary, tertiary and Level 1 trauma care

59 of 90

MCW Dev Team

  • PI: Brad Taylor, CRIO
  • PM/Honest Broker: Kris Osinski, Business Analyst
  • Software Engineers
    • Alex Stoddard - ETL, CDM lead
    • Andrew Vallejos & George Kowalski - i2b2, ACT, PopMedNet, SHRINE, Notes De-id pipeline leads
    • Weihong Jin - ETL/Honest Broker

60 of 90

MCW Data Sources/Population

  • Epic EHR
    • Froedtert (2b facts on ~1.2m patients)
    • Children’s (200m facts on ~600k patients)
  • GE/IDX physician billing (1999+), converted to Epic 12/1/2018
  • Froedtert Health System hospital billing (Epic 2012+, Affinity legacy)
  • NAACCR tumor registry
  • MCW Tissue Bank biospecimens (OnCore)
  • Foundation Medicine genetics data
  • *New* Mosaiq radiation treatment dosimetry data elements

61 of 90

MCW ETL/DW Architecture

Notes:

i2b2 & CDM are built from same source data now

Converting from Oracle db to PostgreSQL 11 in 2018

Added discrete data from Foundation Medicine genetic testing result reports

62 of 90

MCW Strengths/Opportunities

  • Strengths
    • Well-curated CDM
    • TriNetX Query Tool/Trial Connect/Research Network
    • Unique self-service Honest Broker data extract process
    • Expanded i2b2 to adopt ACT/SHRINE ontology/network
    • Participating in All of Us Program with OMOP data model
  • Roadblocks/Opportunities for Improvement
    • Getting access to enriching 3rd party/legacy data is difficult
    • Adding notes & flowsheet data back into i2b2 - high de-id volume
    • Improving visibility across CTSA research community
    • Expanding our exposure in the CTSA RIC/TIC/TIN

63 of 90

Univ of Utah

  • Colin Moynier - REDCap
  • Ainsley Huffman - project manager
  • Reid Holbrook - IT
  • Molly Conroy
  • Rachel Hess - PI

64 of 90

2 Hospitals

12 Community Clinics

5+ Specialty Centers

1 Electronic Health Record System (+3 historical)

200+ Ancillary Systems

1 Enterprise Data Warehouse

65 of 90

66 of 90

Current CDM Status

67 of 90

Appendix: Former Sites

68 of 90

U. C. Davis (guests)

  • Bill Riedl
  • SHRINE
    • Config mgmt code - source

69 of 90

Children’s Mercy Hospital (CMH)

Team

  • Mark Hoffman - PhD, PI
  • Warren Teachout - Director of Decision Support, Honest Broker, PM
  • Rita Fothergill - Software Architect, Programmer
  • Sierra Martin - Programmer (Python, RedCap)
  • Cerner Team: Claire Maples, Aaron Meyer, Paul Albright, Gary Gasperino, Tina McKaig

Clinical Data Sources, Population

  • Cerner Millennium Applications - clinical data, registration, pharmacy, lab, radiology
    • Total # of patients: ~500,000
  • GE Billing system/Meditech historical billing (from 2008)
  • NAACCR Tumor registry (from 1995)
    • Total # of patients: ~3500 patients

70 of 90

CMH Tools, Processes

Processes

  • Cerner Millennium -> HealthFacts -> i2b2
    • Refresh every 2 weeks
  • i2b2 [SCILHS i2p-transform] -> PCORnet CDM

Tools

  • Oracle 11g
  • SAS 9.4
  • RedCap
  • R, Python, SQL
  • Windows, C#, TFS, Git
  • MS SQL server

71 of 90

CMH - Health Facts Workflow

72 of 90

CMH Strengths and Areas to Improve

Strengths

  • Non-human subjects designation for de-identified node of i2b2 (means researchers can carry out retrospective data studies without additional IRB approval)
  • Internal DROC in place to quickly expedite GPC or PCORNet de-identified data requests
  • Cerner supports and enhances i2b2 focusing on PCORNet and GPC requirements

Areas to Improve

  • Availability of orders, especially medications
  • Ensuring drug mappings from EMR to i2b2 and the CDM are complete
  • Ability to easily incorporate registries

73 of 90

UMN

PIs:

Genevieve Melton-Meaux, MD, PhD, FACS, FASCRS, FACMI,

Constantin F. Aliferis MD, PhD, FACMI

Analysts:

Ahmad Abusalah, MSc, PhD (Director of Clinical Informatics)

Gretchen Sieger (Lead Analyst)

Sonya Grillo (Clinical Data Expert)

Kathleen McKay, PhD (PM and HB)

Project Managers:

Kathleen McKay, PhD

Programmers:

Tim Meyer

Luke Bicknese

Duy Duong

Andrew Hangsleben

74 of 90

UMN

Tools:

Oracle (Clinical Data Warehouse)

SQL Server (i2b2 & PCORnet CDM)

Pentaho Data-Integration (Kettle)

Jenkins Continuous Integration

i2b2 & SHRINE

Primary Data Sources:

EMR Data from Fairview Health Services

Claims Data from University of Minnesota Physicians

Tumor Registry from 4 hospitals

Death Records (MN)

Enrichment:

Data Modeling

Terminology Management

Code Mapping

Geocoding

Patient Matching

Ontology

Free-Text Notes Indexing

75 of 90

UMN

Clinical Data Management and Integration System

Metadata Management

Schema Management

Table Management

Row Counts & Load Information

Columns

Stats & Profiling

Relationships

Feeds Management

Define Destination, Source Query, Type of Load

Data Request Management

Patient Sets

Queries

Users (Integration with Active Directory)

De-Identification

76 of 90

WISC

  • People
    • PI’s
      • Mark Drezner
      • Umberto Tachinardi
    • Other Useful folks
      • Sarah Esmond
      • Laura Ladick

    • Geeks
      • Tom Mish
      • Don Steger
      • Debbie Yoshihara
      • Yiqiang Song

77 of 90

WISC

78 of 90

WISC

79 of 90

HackathonFour Dev Recap

Goals:

  • SNOW SHRINE
    • Toward interactive federated query
  • Unstructured Notes De-identification
    • From multiple sites
  • GROUSE
    • CMS claims data integration
    • Aka “RESDAC”
  • Roadmap, plans, peer-to-peer learning...

80 of 90

Shrine flow

  1. George logs into shrine web client (tomcat) blueberry. (in DMZ) gets a session-id X123
    1. Blueberry authenticates, authorizes (qualified investigator, trained, …) using PM cell in i2b2.mcw.edu (internal). George is on project P67
  2. George issues query (in shrine ontology) from his web browser to blueberry using X123 (submit)
  3. Blueberry forwards to hub (snow-hub.wisc.edu)
    • Certs, IP addresses have been exchanged between blueberry and snow-hub
  4. Hub fwds to adapter at each site, e.g. KUMC
  5. KUMC adapter (blobfish in DMZ) maps paths to KUMC ontology
  6. Blobfish fwds to shrine-hive.kumc.edu (internal) (IU has an alternative: different usernames; UMN uses different projects)
  7. Shrine-hive computes counts as usual; returns to blobfish
  8. Blobfish returns counts to snow-hub… blueberry… George’s browser

81 of 90

SHRINE status poll

Moved to meeting notes

82 of 90

SNOW SHRINE

Keith / WISC gave a presentation

Hacked with Lav @KUMC

83 of 90

Unstructured Notes De-identification

MCW has developed within the last year a De-identification Web site : https://cis.ctsi.mcw.edu/

84 of 90

Unstructured Notes De-identification

85 of 90

Unstructured Notes De-identification

86 of 90

Unstructured Notes De-identification

Code base has continued to receive updates over the last year

87 of 90

Unstructured Notes De-identification

We have 4 repos, the first being the main repo of buildable source code and dependencies https://bitbucket.org/MCW_BMI/notes-deidentification

Next we have the Docker repo that can be used to create a self contained Docker instance to run the code above against a database.

https://bitbucket.org/MCW_BMI/notes-deidentification-docker

We will have a pre-compiled repo with no source code that one can use to pull down and run against a database.

https://bitbucket.org/MCW_BMI/notes-deidentification-standalone

Finally, we will have a repo of example code that calls the De-identification Website described earlier at : https://cis.ctsi.mcw.edu/ to do small sets of notes de-identification :

https://bitbucket.org/MCW_BMI/notes-deidentification-web-services

One can run in this code in any python environment.

88 of 90

GROUSE record linkage, scenarios

89 of 90

GROUSE Finder Files

  • MU, IU hunted down HICs

90 of 90

Thanks, everybody!

Especially Hillary!