1 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetAARNet Associate Director, eResearch

frankie.stevens@aarnet.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDE Delivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

2 of 50

Questions for each speaker to address

2

Overview

What is your key business / use case for analysis of sensitive data?

What technical / implementation constraints did this create?

Either as a result of governance, security, data format or management issues particular to the sensitive data involved.

What was your solution (technology/architecture) to these constraints?

3 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetRole & Organisation

Email@org.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

4 of 50

1.1.Key business/use case for analysis of sensitive data (SeRP)

4

Response

Scalable Data Governance

Data Governance includes the policies, processes, roles and responsibilities around data collection, management, use and data protection

Impact of barriers to Data Governance

Valuable research data is locked up in silos

Challenges

Data governance at scale can be resource intensive & challenging

There are no commercial solutions or national platforms for data governance that provide a secure, trusted and scalable environment that leverages existing institutional research infrastructure

5 of 50

  1. SeRP - Background

5

  • Multi-million pound investments from Medical Research Council (MRC), Economic & Social Research Council (ESRC) and Welsh Government�
  • Flagship research projects
    • SAIL DatabankThe broadest & most accessible source of anonymised population data in the world�
    • ADR UKUK’s public sector data for policy development�
    • Dementia Platform UKPlatform with records of over 3 million people from 47 long-term studies

6 of 50

SeRP - Background

6

Monash SeRP Infrastructure

Building Blocks

  • Enterprise Systems
    • MFA
    • VPN
    • Monash Active Directory
      • User Directory
      • Domain Registration
      • System Center Configuration Manager
  • Hosting Platform
    • Monash Research Cloud
      • Controllers and VDIs
    • Research Data Storage
      • Projects and Users Shares
  • VDI Solution
    • Leostream
      • HTML5 Remote Desktop

7 of 50

SeRP - Background

7

Monash SeRP Security

  • Implemented as a result of Penetration Testing and Security Risk Assessment
  • SeRP instances inherits security GPOs (regular security patching and monitoring)

8 of 50

SeRP - Background

8

Monash SeRP Workflow

9 of 50

SeRP - Background

9

Monash SeRP Features (Custodians)

Data custodian governed:

  • File in and file out approval process
  • Customisable roles and permission levels
  • Dataset upload and associated quality checks/reports

Modules/features in development:

  • Data linkage
  • Natural Language Processing
  • Big data processing (ML, Genomics)

10 of 50

SeRP - Background

10

Monash SeRP Features (Researchers)

Pre-installed suite of software

  • Customisable per project

Access and Storage

  • Storage shares for projects and users
  • Restricted access (e.g no internet)

High powered computing

  • “Flavours” (e.g Small, Medium, Large, X-Large)
  • GPU support

11 of 50

SeRP - Background

12 of 50

1.2.Technical / Implementation constraints from business / use case (SeRP)

12

Response

Point 1

The ability to use existing building blocks (lego pieces) for deploying Monash SeRP is a priority. This includes the hosting platform, storage infrastructure, identity system and remote desktop interface.

Point 2

Security on the infrastructure has to be uplifted to support hosting of sensitive information. This includes the hosting platform, storage backend and configuration procedures..

Point 3

The platform would need to empower data custodians to apply their data governance requirements into their own projects and manage its lifecycle. This includes the ability to authorise project members, data ingress and egress transactions.

13 of 50

1.3.Solution (Technical / Architecture) to resolve constraints (SeRP)

13

Response

Point 1

Monash SeRP uses R@CMon (Monash Research Cloud)for hosting, Secure Research Data Storage for storage, Monash AD for identity and domain policies and Leostream for remote desktop interface. Monash Security/Privacy Office teams for assessment/support.

Point 2

Monash SeRP’s infrastructure has leveraged dedicated private networks, managed via SDN (Neutron) with delegated access to the Monash SeRP project. Additional network ACLs and security groups are in place for added security controls. Dedicated set of hypervisors are used for hosting the Monash SeRP infrastructure.

Point 3

Monash SeRP’s Secure 3 (S3) model allows projects to be managed by non-technical teams (e.g by data custodians). This is done using SeRP’s projects web portal. Data custodians have full control on project roles assigned to its members.

14 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetRole & Organisation

Email@org.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

15 of 50

2. AIS - Background

15

Integration between federated nodes

Synchronization of projects between nodes

Harmonized data types and metadata

Shared containerized analysis pipelines

CLARA Federated Learning

Distributed Federation of Nodes

Each node is a self-contained data centric secure analysis platform under its own governance

Functions like a research Vendor Neutral Archive (VNA)

1 findable node

9 current nodes

3 planned nodes

7 in discussion nodes

16 of 50

2.1.Key business cases for analysis of sensitive data via (AIS)

16

Response

Standard Useful De-Identification

HIPPA, Privacy Act, GDPR

Need to balance scientific usefulness and benefit to society vs patient privacy. Needs to be automated and transparent.

Data Centric Computing

Secure transfer and provenance between instrument, storage, and compute, using structured data and RESTful api. Everything must be browser accessible from within clinical sites.

Time to Research

Streamlined data egress from clinical sites and self-service project setup and analysis environments

On-demand compute and no terminals!

AIS Project Overview

De-identification Decision-Making Framework

17 of 50

2.2.Implementation constraints from business case (AIS)

17

Response

Portability

Needs to run on multiple environments, both in Australia and internationally. On Prem, AWS, Azure, Google, & OpenStack.

Security

Many imaging modalities are inherently identifiable. Security and auditing need to be approved by state health bodies and Local Health Districts.

Scalability

Studies with 50,000+ patients. Imaging equipment with 5+ TB per day. Expected >1 PB for some nodes.

Proprietary vendor data and new emerging techniques.

Technologies

Kubernetes Implementations

18 of 50

2.3.Solution to resolve constraints (AIS)

18

Response

Kubernetes + Service Mesh (Portability, Scalability, Security)

Every node provides up to the Kubernetes control plane.

AIS is a cloud native servless design that runs on Kubernetes.

IRAP Reference Architecture.

Clinical Trials Processor (De-Identification)

Fleet of edge devices (1 per site for DICOM, 1 per instrument type for non-DICOM) that manage de-identification, whitelisting, encryption, and routing before leaving the clinical site.

Granular Access Control at Application Layer

Auditable access control to both user and environment based on subject and data type defined per project.

CRF 21 Part 11 hardened version for clinical trials and e-signatures.

XNAT CRF 21 Part 11

Clinical Trials Processor

XNAT Access Control

19 of 50

2.4. De-Identification and Access Control for AIS

19

Granular Access Control at Application Layer

Segregated access for source clinical data and research projects

Clear HREC approval & patient consent mapping

Allows lifecycle management

CTP Process

1) Whitelist for approved projects

2) De-identification template

3) Routing “Project” “Session” “Scan” metadata

4) Encrypted upload

20 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetRole & Organisation

Email@org.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

21 of 50

3. Secure Data Enclaves - Zero-Trust Architecture

21

FIPS: Federal Information Processing Standard

ACI: Cisco Application Centric Infrastructure

22 of 50

What Data Providers are asking for?

22

23 of 50

Enterprise Grade Security and Capabilities

23

24 of 50

Services Delivered Using Secure Data Enclaves (SDE)

24

25 of 50

Services Delivered Using Secure Data Enclaves (SDE)

Virtual Data Room - Secure Virtual Desktop Environment

25

26 of 50

Services Delivered Using Secure Data Enclaves (SDE)

Secure HPCaaS

26

27 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetRole & Organisation

Email@org.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow�UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

28 of 50

4. Australian Computer Assisted Theragnostics (AusCAT)- Background

28

  • Clinical decision support models – comparison of clinician and model prediction.������
  • Seeking tools to stratify patients for treatment using clinical/imaging/segmentation data nationally and internationally.�������
  • Concept of project in 2013 – started with a grant late 2014.

29 of 50

4.1.Key business/use case for analysis of sensitive data (AusCAT)

29

  • Detailed cancer related data – including medical imaging – are held within hospital databases.�
  • Useful for clinical decision models and auditing tools not available in current cancer registries.�
  • Data is in silos. Requires time consuming governance framework with many administrative overheads to centralise data for each project.�
  • We chose to use a federated / distributed learning approach to avoid transmission of individual patient data.

You can place stand out information or call to actions here

30 of 50

4.2.Technical / Implementation constraints from business / use case (AusCAT)

30

  • Federated / distributed learning network for machine learning.
    • No data transmitted at patient level
    • Aggregates, statistics, model parameters transmitted.

  • Working with local hospital resources and IT groups.�
  • Tools to answer clinical questions and suitable for research work. Use of MATLAB and Python scientific languages.

  • Machine learning approaches: direct methods, consensus models, approximate models.

S. Boyd, et al, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, Foundations and Trends of Machine learning, 2011.

31 of 50

4.3.Solution (Technical / Architecture) to resolve constraints (AusCAT)

31

  • Python & Pentaho data integration ETL. PostgreSQL for research DB. Clinical trial processor (CTP) for de-identification of DICOM. Orthanc PACS for storage of DICOM. RDF4J for standardized data storage mapped to ontology.
  • Web services in Java with client/server TLS certificate authentication. Separate for deployment and message passing for machine learning.
  • Simulation environment for algorithm development with test/public data.
  • System restricted to small collaborator user group. Infrastructure resources are limited in AusCAT, international collaborators also have installed server node instances.

32 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetRole & Organisation

Email@org.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

33 of 50

5.ERICA - Background

33

  • There is an imperative that routinely-collected administrative and operational clinical data used for population health, health services and clinical research is kept very safe
    • The data are almost always sensitive and, due to high levels of detail, and/or longitudinal linkage, are also almost always potentially re-identifiable, even if prima facie de-identified.
    • The Five Safes framework is considered best practice for addressing such issues.
    • ERICA specifically addresses the Safe Settings criterion of Five Safes, although mandatory ERICA training also partially addresses the Safe People requirement.

  • ERICA provides a secure remote-access enclave solution to satisfy the Safe Settings requirements.
    • The threat model which ERICA addresses assumes that users of ERICA (researchers) are “honest but sloppy”, while everyone else is assumed to be hostile.

  • ERICA is also a meta-framework for setting up independent instances of ERICA (each hosting hundreds of projects)
    • Thus universities, research institutes or data providers can all run their own entirety independent instances of ERICA, with complete control over it and its users, without relying on third-party operators.
    • ERICA instances therefore become cloud-based parts of each institution’s own IT infrastructure.
    • Only the underlying software code base is shared between ERICA instances through a consortium arrangement

34 of 50

5.1.Key business/use case for analysis of sensitive data (ERICA)

34

Response

Linked, whole-of-population administrative health data for population health and health services research

  • Typically de-identified, only a moderate level of medical detail, but may include linked records for every person in a population over several decades.
    • Always de-identified, but the data are trivially re-identifiable and thus represent a huge hazard if stolen or accessed inappropriately.
  • Traditionally analysed using legacy statistical software (SAS, SPSS, Stata etc) as flat files on Windows desktops (and laptops!)
    • Institutions need to be able to use existing site licenses for legacy proprietary software, which the ERICA model facilitates
  • Increasingly ML and deep learning methods being used, requiring open-source-friendly compute facilities (linux servers) with large memory and GPUs.

Complex clinical data extracted from hospital and GP EMR systems

  • Very complex sources systems (eg CERNER, MOSAIQ, ARIA)
  • Complex data transformation pipelines into a researcher-ready (or researcher-friendly) form such as OHDSI OMOP common data model/vocab
  • Researchers require high-performance relational database support, plus compute for ETL pipelines, and ML/deep learning facilities with GPUs, large memory etc using exclusively open-source stacks
  • Large diagnostic imaging and radiomics data may be involved
    • Eg CaVa, ACDN, CardiacAI

Health and medical research used to be a sleepy backwater in terms of computing requirements: Windows desktops, spreadsheets and legacy proprietary stats packages were all that were required.

But it is rapidly changing: now state-of-the-art ML and deep learning software stacks and the computing infrastructure to run then are becoming de rigeur and a sine qua non.

35 of 50

5.2.Technical / Implementation constraints from business / use case (ERICA)

35

Response

Research projects come and go

  • Some research projects last for years or decades and require a stable computing environment over that lifetime
  • Many last for just months, student projects often just weeks
  • Project set-up may be urgent in the minds of the researchers (and then nothing happens for months)

Health research is a stop-start, episodic undertaking

  • A lot of time is spent waiting for governance approvals and data supply
  • Researchers are rarely dedicated full-time to a project, take long holidays, have to teach for a whole term etc
  • Thus allocating expensive computing infrastructure is inefficient; on-demand is much better

Computing infrastructure for health research is cost-sensitive

  • Research funding rarely includes sufficient budget for computing and data storage (v. big sigh).

Cutting-edge open source software is rarely designed to be deployed in highly secure computing environments

  • Open access to the internet, GitHub etc is usually assumed
    • Creative work-arounds are required, there may be trade-offs required

36 of 50

5.3.Solution (Technical / Architecture) to resolve constraints (ERICA)

36

ERICA embraces IaaS and the rich set of managed services available from commercial cloud providers

  • ERICA is built on AWS, uses about 30 distinct AWS managed services to dramatically reduce system admin and maintenance overheads (downside is increased usage costs, but everything can be turned off or suspended when not needed).
  • Most of the set-up and all operations are done using well-tested, version-controlled CloudFormation templates, scripts, lamba processes etc.
    • Mitigates against the main weakness of cloud computing: misconfiguration errors.
  • Core components of each ERICA instance are all redundant and/or self-healing, leveraging automated and scheduled rebuilding and current CI/CD and DevOps deployment tools and principles, resulting in very high reliability and availability

ERICA provides flexibility

  • Resources within each project space can be turned off or suspended to save operating costs
  • Custom compute and storage and database facilities can easily be provisioned inside each project space, turned on and off by end users as required

Audit trails everywhere (and security accreditation)

  • All data going in and research results coming out of every ERICA project space is logged and captured for potential later analysis
    • End users know that if they cheat, they are likely to be caught (with major career consequences through misconduct proceedings etc)

37 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetAARNet Associate Director, eResearch

frankie.stevens@aarnet.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

38 of 50

6. AARNet Secure Data Service - Background, Overview

38

Frankie Stevens, Robert Pocklington

39 of 50

6.1. Key business/use case for analysis of sensitive data (AARNet)

39

Response

Discipline Agnostic

  • Appropriate for use with sensitive data of a Health & Medical, Cultural, Ecological, Indigenous, Commercial nature etc...

Easy Cross Institutional Collaboration

  • Access to the service is through existing institutional credentials
  • International access and non-AAF enabled access is also possible

Auditable and Controlled Collaboration

  • Full Audit logs enable complete visibility of all actions
  • Roles enable appropriate Institutional and Research control on collaboration - which includes the ability to download data where appropriate

AARNet Sensitive Data Service - The Story so Far

Trusted National Infrastructure Provider

  • Not for Profit, owned by the sector, for the sector
  • No need for institutions to host their own solutions
  • AARNet hosted cloud environment - Secure yet convenient

Want to see more?

40 of 50

6.2. Technical / Implementation constraints from business / use case (AARNet)

40

Response

Resourcing and COVID Constraints

  • COVID-19 lockdowns in Australia
  • Hardware supply restrictions and server installation delays
  • Hiring challenges

Development Constraints

  • Increased use of AARNet services during COVID-19
  • Small team with fortnightly demos and discussions

Compliance Constraints for Sensitive Data Projects

  • How do we ensure our users can use our platform with confidence balancing usability and convenience against legislation and best practice security standards?

41 of 50

6.3. Solution (Technical / Architecture) to resolve constraints (AARNet)

41

Response

Resourcing and COVID-19 Solutions

  • Repurposed existing hardware for proof of concept
  • Onboarding new starters

Development Solutions

  • Agile development process
  • Strict focus on building and validating key features with users
  • Utilising containerisation and Kubernetes for scale and redundancy
  • New user interface which is faster, modern, intuitive, accessible and responsive

Compliance Solutions for Sensitive Data Projects

  • Increased login security via multi-factor authentication
  • Introduced stricter access controls
  • Implemented fine-grained exportable audit logging
  • Disabled ad-hoc sharing of files or folders and personal space
  • Certifying our processes with ISO 27001 and reviewing other established state-based legislation and data sovereignty concerns
  • Aiming towards Safe Setting accreditation under the upcoming Data Availability & Transparency Act

AARNet Sensitive Data Pilot Brochure

Want to know more?

42 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetRole & Organisation

Email@org.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

43 of 50

7. OMOP PSB - Background

43

  • Hospital data is held in complex EMR systems which is difficult for IT departments to clean and provision for researchers and is complex for researchers to request and utilise
  • The OHDSI OMOP Common Data Model (OMOP) is an international de-facto standard common data model that collapses clinical data to fifteen tables for research purposes simplifying access mechanisms and simplifying research use

This project aims to create an extensible, distributed national data asset and will:

  • convert data from three large CERNER hospital EMR’s (Queensland Public Health, Austin Health Melbourne, Western Health Melbourne) to the OHDSI OMOP CDM
  • evaluate the ability to convert EPIC EMR data
  • test the ability to undertake research using the model
  • make the tools and terminology conversions available to support further national conversion of hospital EMRs

44 of 50

7.1.Key business/use case for analysis of sensitive data (OMOP PSB)

44

Response

Point 1

Hospital EMR’s contain many thousands of data tables in complex schemas that require huge expertise to understand and compile for research use. We will be utilising previous experience in converting such data to the OMOP CDM to convert datasets from 3 hospitals and making the knowledge publicly available to support further conversions.

Point 2

OMOP has a large international community that are continually developing open-source tools that work on top of the fifteen standards tables. Free training and access to these tools is available that acts as a force multiplier for research.

Point 3

Data governance is completely transformed - instead of requesting access a dataset, the researcher requests for their research model (usually developed in R) to be run on a data repository. I.e. the researcher gets results only without a hospital needing to agree to a data release.

About OHDSI:

Free training:

Open source tools:

45 of 50

7.2.Technical / Implementation constraints from business / use case (OMOP PSB)

45

Response

Point 1

The overhead of preparing data in data warehouses for research use is very high and non-standard terminologies are utilised. Research outputs are costly and often have long delays due to process issues and staff availability

Point 2

The non-standard nature of each hospital EMR instance and the complexity of converting data from the native format prevents a huge barrier to standardisation. A conversion needs very carefully undertaken to avoid introducing issues related to data quality in the conversion.

Point 3

Expertise is not present in the hospitals for them to consider how to undertake such a conversion. Internationally, SME’s and some research groups have developed expertise. We are starting to develop some expertise in Australia but we wish to have the knowledge and tools to support undertaking Australian conversions in the public domain to lower cost and increase the rate of conversion.

46 of 50

7.3.Solution (Technical / Architecture) to resolve constraints (OMOP PSB)

46

Response

Point 1

UNSW have experience developing a tool to automate the ETL of converting CERNER data repositories to the OMOP table format. The tool utilises YAML to hold the configuration applied to each CERNER repository.

Point 2

In addition to the UNSW experience, the University of Melbourne have experience in OMOP conversions and also clinical experience in managing the complexity of terminology conversions.

Point 3

The project will merge experience of UNSW and UoM to develop quality data conversions and will utilise the AHRA Transformational Data Collaboration, OHDSI Australia and in conjunction with the ARDC to make the conversion tools and lookups available nationally for non-commercial utilisation.

AHRA Transformational Data Collaboration

OHDSI Australia

47 of 50

TIM CHURCHES

ERICASnr Research Fellow, Health Data Science, UNSW Medicine

timothy.churches@unsw.edu.au

15 APRIL 2021

Tech Talk 4 - Platforms for Sensitive Data Analysis:

SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.

FRANKIE STEVENS AARNetAARNet Associate Director, eResearch

frankie.stevens@aarnet.edu.au

DOUGIE BOYLE

OMOP PSBProf, Health Data Science, The University of Melbourne

dboyle@unimelb.edu.au

MATTHEW FIELD AusCATResearch Fellow

UNSW Medicine

matthew.field@unsw.edu.au

ANITHA KANNAN SERPDirector Research Platform

Monash University

anitha.kannan@monash.edu

RYAN SULLIVAN

AISProduct Specialist - Characterization

ryan.sullivan@sydney.edu.au

AMR HASSAN

SDEDelivery Leader TS and eResearch

Monash University

amr.hassan@monash.edu

Guest Chair

Dr Steven McEachern

Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au

ARDC is enabled by NCRIS.

48 of 50

Panel Discussion - for all our speakers.

48

  • In this discussion we’ll explore:
      • Technological problems encountered as a direct result of the sensitivity of the data involved.
      • Were there similarities with your use cases / problems you heard from others today?
      • Similarities in the technical solutions and architectures that were arrived at.
      • Any key enabling (or problematic) technologies?
      • What might you do differently based on what you’ve heard today?

49 of 50

Questions from the Audience?

49

50 of 50

Thank you!

More Information on Tech Talks page : https://sites.google.com/ardc.edu.au/techtalk2020/talks

ARDC is enabled by NCRIS.