GPC DevTeams�Tools and Techniques
HackathonSix Dec 2018�HackathonFive Feb 2018�HackathonFour January 2017
DevTeams at 12 Sites
See also: DevTeams
Tell us about you and your site
KUMC
HERON Architecture
KUMC: ordinary stuff
Docker: experimenting
KUMC Strengths
60+ monthly releases going back to 2011
Share and Enjoy: kumc-bmi/heron on github
HardwareSizing: 2.2TB FusionIO solid state storage
KUMC terms�on Babel
KUMC: i2p-transform to CDM
Intermountain Team
Intermountain Healthcare
Our Mission: Helping people live the healthiest lives possible®�
10/11/2018
Intermountain Patients by ZipCode 2006 – Present
EDW → AHR → CDM
EDW: A Source of Aggregated Data
Data Sources
Finance
Internal
State/Federal
External
Lab
Claims
Pharmacy
EMR
Data Access
EDW
Primary Care
Data Mart
Rx
Data Mart
Other
Data Marts
CV
Data Mart
Case Mix
Data Mart
Claims
Data Mart
Analytic Processes
Direct SQL
Desktop ODBC tools
Web applications
Statistical Analysis
Reporting Tools
OLAP Tools
OTHERS
Data Mining
Using EDW Data
EDW
Primary Care
Data Mart
Rx
Data Mart
Other
Data Marts
CV
Data Mart
Case Mix
Data Mart
Claims
Data Mart
Financial
Analysis
Mandated
Reporting
Quality
Management
Business
Planning
Clinical
Research
Planning
Research
Data Analysis
Genetic
Epidemiology
Etc.
Data Mining/
Machine Learning
Process
Tracking
Analytic Health Repository
L3 – Analytic
L2 – Building Blocks
L1 – Enterprise Data Warehouse
Enterprise ETL
Modeling Engines
Concepts based on models. These represent high level clinical or business categories that are implemented using a variety of knowledge-based tools (rule engines, ML, ontologies, etc.).
The base data extracted directly from original data sources. Individual tables incomplete, using original coding and structure, with errors unaddressed.
Data restructured and optimized for clinically relevant searching. Carefully screened for errors, converted to national and international coding standards, complete for relevant data over time.
Analytic Health Repository -> CDM Database
L3 – Analytic
L2 – Building Blocks
L1 – Enterprise Data Warehouse
Enterprise ETL
Modeling Engines
PCORnet CDM
Database
10/11/2018
Number of Patients
Demographic Table: 2,830,321
Care Prvdr. Visits, Jul 17-Jun18: 1,469,350
Intermountain Healthcare CDM Vital Statistics
Care Facilities in Utah, S. Idaho
Hospitals 23
Clinics 180
Diagnostic Services, Home Care, Rehab,
Skilled Nursing, Telehealth
Primary Children Medical Center
Number of Care Providers
Employed MD, PA, APRN, other Prof.: 2300
Affiliated practitioners: 3500
Programs of Interest
Select Health Plans claims ~25% patients
Tissue biorepository:
specimens > 30 yrs, ~5 million
Linkages to Utah Population Data Base
CV Family History data base
Active Inmtn Ca Registry
IU / Regenstrief
Team:
PI-...Tim Imler (Umberto T.)
Engineering Team-Tony French, Jeff Stroup
Data Analyst-Ross Hayden
Project Manager-Dan Hood
IU/Regenstrief
Data Sources
-Data fed from Indiana Network for Patient Care (INPC, Indiana Health Information Exchange). Specifically, IU Health and Eskenazi Hospital systems.
-Certain GPC/PCORI deliverables have required data pulls directly from IU Health database
-NAACCR Tumor Registry
-Supplemental Px’s and Dx’s
IU/Regenstrief
ETL Tools and Processes
-Oracle Views for De-identification
-Oracle chains, stored procedures and views for ETL from RMRS → I2B2
-SCILHS i2p-transform for i2b2 → PCORNET CDM V3
-SAS Datasets for PCORNet (Failed attempt at using Oracle Views)
Tools:
Oracle 11g, SAS 9.4, RHEL 6/7, Java, Atlassian Tool Suite, Git,
Docker (investigative), Ansible (investigative)
IU/Regenstrief
GPC partners
People
UIOWA strengths
automated interface with CRDW, REDCap, TeleForm,
statewide cancer registry
Data flow supporting GPC/PCORnet requests
Data environment
Marshfield - People
Marshfield - Data Sources, etc.
Marshfield - ETL Tools, Processes
Marshfield - i2b2 SQL Server Specs
Software: SQL Server 2012 SP1 (v11.0.3000.0)
RAM: 8.0 GB (Max SQL Server memory: 6.144 MB)
CPU: 4 Cores (Intel Xeon CPU E5-2690 v3 @ 2.60 GHz)
Disk:
Notes:
UTHSCSA
UTHSCSA
UTHSCSA Server Specs
Replacing SQL Server and CRC soon.
Further upgrades to New_stage also in the mix.
<3 new_stage
University of Nebraska Medical Center
Clinical partner, Nebraska Medicine.
-Two hospitals and multiple clinics
-Level I trauma center
-Newly opened Buffet Cancer Center.
UNMC People
UNMC Architecture
UNMC Strengths
Developing ONC Standards Metadata for i2b2 Interoperability
Extending LOINC and SNOMED CT for Precision Medicine
UTSW People
UTSW Architecture
UTSW Data Sources, Tools
Data Sources
Tools
UTSW i2b2 Terminology
UTSW Strengths
University of Missouri (MU)
MU Team
Abu Mosa, PhD
Primary Investigator
Honest Broker
Vasanthi Mandhadi
Project Manager
Honest Broker
Informatics Team
Todd McNeeley
Informatics Team
Marshall Gorski
Informatics Team
Cory Gassner
Informatics Team
Kamruz Zaman Rana
Informatics Team
Noelle Al-Khashti
Informatics Team
Jeff Ordway
Patient Advisor
Lynne Lawrence
Patient Engagement Officer
Lori Wilcox
IRB Representative
Jenelle Greaning
IRB Representative
William Stephens
Patient Advisor
MU Data Sources
MU i2b2/CDM ETL
MU Tools
MU Hardware
MU Strengths
MU Areas to Improve
Medical College of Wisconsin
Medical College of Wisconsin
MCW Dev Team
MCW Data Sources/Population
MCW ETL/DW Architecture
Notes:
i2b2 & CDM are built from same source data now
Converting from Oracle db to PostgreSQL 11 in 2018
Added discrete data from Foundation Medicine genetic testing result reports
MCW Strengths/Opportunities
Univ of Utah
•2 Hospitals
•12 Community Clinics
•5+ Specialty Centers
•1 Electronic Health Record System (+3 historical)
•200+ Ancillary Systems
•1 Enterprise Data Warehouse
•Current CDM Status
Appendix: Former Sites
U. C. Davis (guests)
Children’s Mercy Hospital (CMH)
Team
Clinical Data Sources, Population
CMH Tools, Processes
Processes
Tools
CMH - Health Facts Workflow
CMH Strengths and Areas to Improve
Strengths
Areas to Improve
UMN
PIs:
Genevieve Melton-Meaux, MD, PhD, FACS, FASCRS, FACMI,
Constantin F. Aliferis MD, PhD, FACMI
Analysts:
Ahmad Abusalah, MSc, PhD (Director of Clinical Informatics)
Gretchen Sieger (Lead Analyst)
Sonya Grillo (Clinical Data Expert)
Kathleen McKay, PhD (PM and HB)
Project Managers:
Kathleen McKay, PhD
Programmers:
Tim Meyer
Luke Bicknese
Duy Duong
Andrew Hangsleben
UMN
Tools:
Oracle (Clinical Data Warehouse)
SQL Server (i2b2 & PCORnet CDM)
Pentaho Data-Integration (Kettle)
Jenkins Continuous Integration
i2b2 & SHRINE
Primary Data Sources:
EMR Data from Fairview Health Services
Claims Data from University of Minnesota Physicians
Tumor Registry from 4 hospitals
Death Records (MN)
Enrichment:
Data Modeling
Terminology Management
Code Mapping
Geocoding
Patient Matching
Ontology
Free-Text Notes Indexing
UMN
Clinical Data Management and Integration System
Metadata Management
Schema Management
Table Management
Row Counts & Load Information
Columns
Stats & Profiling
Relationships
Feeds Management
Define Destination, Source Query, Type of Load
Data Request Management
Patient Sets
Queries
Users (Integration with Active Directory)
De-Identification
WISC
WISC
WISC
HackathonFour Dev Recap
Goals:
Shrine flow
SHRINE status poll
Moved to meeting notes
SNOW SHRINE
Keith / WISC gave a presentation
Hacked with Lav @KUMC
Unstructured Notes De-identification
MCW has developed within the last year a De-identification Web site : https://cis.ctsi.mcw.edu/
Unstructured Notes De-identification
Unstructured Notes De-identification
Unstructured Notes De-identification
Code base has continued to receive updates over the last year
Unstructured Notes De-identification
We have 4 repos, the first being the main repo of buildable source code and dependencies https://bitbucket.org/MCW_BMI/notes-deidentification
Next we have the Docker repo that can be used to create a self contained Docker instance to run the code above against a database.
https://bitbucket.org/MCW_BMI/notes-deidentification-docker
We will have a pre-compiled repo with no source code that one can use to pull down and run against a database.
https://bitbucket.org/MCW_BMI/notes-deidentification-standalone
Finally, we will have a repo of example code that calls the De-identification Website described earlier at : https://cis.ctsi.mcw.edu/ to do small sets of notes de-identification :
https://bitbucket.org/MCW_BMI/notes-deidentification-web-services
One can run in this code in any python environment.
GROUSE record linkage, scenarios
GROUSE Finder Files
Thanks, everybody!
Especially Hillary!