TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�AARNet Associate Director, eResearch
frankie.stevens@aarnet.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
Questions for each speaker to address
2
Overview
What is your key business / use case for analysis of sensitive data?
What technical / implementation constraints did this create?
Either as a result of governance, security, data format or management issues particular to the sensitive data involved.
What was your solution (technology/architecture) to these constraints?
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�Role & Organisation
Email@org.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
1.1.Key business/use case for analysis of sensitive data (SeRP)
4
Response
Scalable Data Governance
Data Governance includes the policies, processes, roles and responsibilities around data collection, management, use and data protection
Impact of barriers to Data Governance
Valuable research data is locked up in silos
Challenges
Data governance at scale can be resource intensive & challenging
There are no commercial solutions or national platforms for data governance that provide a secure, trusted and scalable environment that leverages existing institutional research infrastructure
5
SeRP - Background
6
Monash SeRP Infrastructure
Building Blocks
SeRP - Background
7
Monash SeRP Security
SeRP - Background
8
Monash SeRP Workflow
SeRP - Background
9
Monash SeRP Features (Custodians)
Data custodian governed:
Modules/features in development:
SeRP - Background
10
Monash SeRP Features (Researchers)
Pre-installed suite of software
Access and Storage
High powered computing
SeRP - Background
1.2.Technical / Implementation constraints from business / use case (SeRP)
12
Response
Point 1
The ability to use existing building blocks (lego pieces) for deploying Monash SeRP is a priority. This includes the hosting platform, storage infrastructure, identity system and remote desktop interface.
Point 2
Security on the infrastructure has to be uplifted to support hosting of sensitive information. This includes the hosting platform, storage backend and configuration procedures..
Point 3
The platform would need to empower data custodians to apply their data governance requirements into their own projects and manage its lifecycle. This includes the ability to authorise project members, data ingress and egress transactions.
1.3.Solution (Technical / Architecture) to resolve constraints (SeRP)
13
Response
Point 1
Monash SeRP uses R@CMon (Monash Research Cloud)for hosting, Secure Research Data Storage for storage, Monash AD for identity and domain policies and Leostream for remote desktop interface. Monash Security/Privacy Office teams for assessment/support.
Point 2
Monash SeRP’s infrastructure has leveraged dedicated private networks, managed via SDN (Neutron) with delegated access to the Monash SeRP project. Additional network ACLs and security groups are in place for added security controls. Dedicated set of hypervisors are used for hosting the Monash SeRP infrastructure.
Point 3
Monash SeRP’s Secure 3 (S3) model allows projects to be managed by non-technical teams (e.g by data custodians). This is done using SeRP’s projects web portal. Data custodians have full control on project roles assigned to its members.
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�Role & Organisation
Email@org.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
2. AIS - Background
15
Integration between federated nodes
Synchronization of projects between nodes
Harmonized data types and metadata
Shared containerized analysis pipelines
CLARA Federated Learning
Distributed Federation of Nodes
Each node is a self-contained data centric secure analysis platform under its own governance
Functions like a research Vendor Neutral Archive (VNA)
1 findable node
9 current nodes
3 planned nodes
7 in discussion nodes
2.1.Key business cases for analysis of sensitive data via (AIS)
16
Response
Standard Useful De-Identification
HIPPA, Privacy Act, GDPR
Need to balance scientific usefulness and benefit to society vs patient privacy. Needs to be automated and transparent.
Data Centric Computing
Secure transfer and provenance between instrument, storage, and compute, using structured data and RESTful api. Everything must be browser accessible from within clinical sites.
Time to Research
Streamlined data egress from clinical sites and self-service project setup and analysis environments
On-demand compute and no terminals!
AIS Project Overview
De-identification Decision-Making Framework
2.2.Implementation constraints from business case (AIS)
17
Response
Portability
Needs to run on multiple environments, both in Australia and internationally. On Prem, AWS, Azure, Google, & OpenStack.
Security
Many imaging modalities are inherently identifiable. Security and auditing need to be approved by state health bodies and Local Health Districts.
Scalability
Studies with 50,000+ patients. Imaging equipment with 5+ TB per day. Expected >1 PB for some nodes.
Proprietary vendor data and new emerging techniques.
Technologies
Kubernetes Implementations
2.3.Solution to resolve constraints (AIS)
18
Response
Kubernetes + Service Mesh (Portability, Scalability, Security)
Every node provides up to the Kubernetes control plane.
AIS is a cloud native servless design that runs on Kubernetes.
IRAP Reference Architecture.
Clinical Trials Processor (De-Identification)
Fleet of edge devices (1 per site for DICOM, 1 per instrument type for non-DICOM) that manage de-identification, whitelisting, encryption, and routing before leaving the clinical site.
Granular Access Control at Application Layer
Auditable access control to both user and environment based on subject and data type defined per project.
CRF 21 Part 11 hardened version for clinical trials and e-signatures.
XNAT CRF 21 Part 11
Clinical Trials Processor
XNAT Access Control
2.4. De-Identification and Access Control for AIS
19
Granular Access Control at Application Layer
Segregated access for source clinical data and research projects
Clear HREC approval & patient consent mapping
Allows lifecycle management
CTP Process
1) Whitelist for approved projects
2) De-identification template
3) Routing “Project” “Session” “Scan” metadata
4) Encrypted upload
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�Role & Organisation
Email@org.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
3. Secure Data Enclaves - Zero-Trust Architecture
21
FIPS: Federal Information Processing Standard
ACI: Cisco Application Centric Infrastructure
What Data Providers are asking for?
22
Enterprise Grade Security and Capabilities
23
Services Delivered Using Secure Data Enclaves (SDE)
24
Services Delivered Using Secure Data Enclaves (SDE)
Virtual Data Room - Secure Virtual Desktop Environment
25
Services Delivered Using Secure Data Enclaves (SDE)
Secure HPCaaS
26
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�Role & Organisation
Email@org.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow�UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
4. Australian Computer Assisted Theragnostics (AusCAT)- Background
28
4.1.Key business/use case for analysis of sensitive data (AusCAT)
29
You can place stand out information or call to actions here
4.2.Technical / Implementation constraints from business / use case (AusCAT)
30
S. Boyd, et al, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”, Foundations and Trends of Machine learning, 2011.
4.3.Solution (Technical / Architecture) to resolve constraints (AusCAT)
31
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�Role & Organisation
Email@org.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
5.ERICA - Background
33
5.1.Key business/use case for analysis of sensitive data (ERICA)
34
Response
Linked, whole-of-population administrative health data for population health and health services research
Complex clinical data extracted from hospital and GP EMR systems
Health and medical research used to be a sleepy backwater in terms of computing requirements: Windows desktops, spreadsheets and legacy proprietary stats packages were all that were required.
But it is rapidly changing: now state-of-the-art ML and deep learning software stacks and the computing infrastructure to run then are becoming de rigeur and a sine qua non.
5.2.Technical / Implementation constraints from business / use case (ERICA)
35
Response
Research projects come and go
Health research is a stop-start, episodic undertaking
Computing infrastructure for health research is cost-sensitive
Cutting-edge open source software is rarely designed to be deployed in highly secure computing environments
5.3.Solution (Technical / Architecture) to resolve constraints (ERICA)
36
ERICA embraces IaaS and the rich set of managed services available from commercial cloud providers
ERICA provides flexibility
Audit trails everywhere (and security accreditation)
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�AARNet Associate Director, eResearch
frankie.stevens@aarnet.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
6. AARNet Secure Data Service - Background, Overview
38
Frankie Stevens, Robert Pocklington
6.1. Key business/use case for analysis of sensitive data (AARNet)
39
Response
Discipline Agnostic
Easy Cross Institutional Collaboration
Auditable and Controlled Collaboration
AARNet Sensitive Data Service - The Story so Far
Trusted National Infrastructure Provider
Want to see more?
6.2. Technical / Implementation constraints from business / use case (AARNet)
40
Response
Resourcing and COVID Constraints
Development Constraints
Compliance Constraints for Sensitive Data Projects
6.3. Solution (Technical / Architecture) to resolve constraints (AARNet)
41
Response
Resourcing and COVID-19 Solutions
Development Solutions
Compliance Solutions for Sensitive Data Projects
AARNet Sensitive Data Pilot Brochure
Want to know more?
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�Role & Organisation
Email@org.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
7. OMOP PSB - Background
43
This project aims to create an extensible, distributed national data asset and will:
7.1.Key business/use case for analysis of sensitive data (OMOP PSB)
44
Response
Point 1
Hospital EMR’s contain many thousands of data tables in complex schemas that require huge expertise to understand and compile for research use. We will be utilising previous experience in converting such data to the OMOP CDM to convert datasets from 3 hospitals and making the knowledge publicly available to support further conversions.
Point 2
OMOP has a large international community that are continually developing open-source tools that work on top of the fifteen standards tables. Free training and access to these tools is available that acts as a force multiplier for research.
Point 3
Data governance is completely transformed - instead of requesting access a dataset, the researcher requests for their research model (usually developed in R) to be run on a data repository. I.e. the researcher gets results only without a hospital needing to agree to a data release.
About OHDSI:
Free training:
Open source tools:
7.2.Technical / Implementation constraints from business / use case (OMOP PSB)
45
Response
Point 1
The overhead of preparing data in data warehouses for research use is very high and non-standard terminologies are utilised. Research outputs are costly and often have long delays due to process issues and staff availability
Point 2
The non-standard nature of each hospital EMR instance and the complexity of converting data from the native format prevents a huge barrier to standardisation. A conversion needs very carefully undertaken to avoid introducing issues related to data quality in the conversion.
Point 3
Expertise is not present in the hospitals for them to consider how to undertake such a conversion. Internationally, SME’s and some research groups have developed expertise. We are starting to develop some expertise in Australia but we wish to have the knowledge and tools to support undertaking Australian conversions in the public domain to lower cost and increase the rate of conversion.
7.3.Solution (Technical / Architecture) to resolve constraints (OMOP PSB)
46
Response
Point 1
UNSW have experience developing a tool to automate the ETL of converting CERNER data repositories to the OMOP table format. The tool utilises YAML to hold the configuration applied to each CERNER repository.
Point 2
In addition to the UNSW experience, the University of Melbourne have experience in OMOP conversions and also clinical experience in managing the complexity of terminology conversions.
Point 3
The project will merge experience of UNSW and UoM to develop quality data conversions and will utilise the AHRA Transformational Data Collaboration, OHDSI Australia and in conjunction with the ARDC to make the conversion tools and lookups available nationally for non-commercial utilisation.
AHRA Transformational Data Collaboration
OHDSI Australia
TIM CHURCHES
ERICA �Snr Research Fellow, Health Data Science, UNSW Medicine
timothy.churches@unsw.edu.au
15 APRIL 2021
Tech Talk 4 - Platforms for Sensitive Data Analysis:
SeRP, AIS, SDE, AusCAT, ERICA, AARNet, OMOP PSB.
FRANKIE STEVENS AARNet�AARNet Associate Director, eResearch
frankie.stevens@aarnet.edu.au
DOUGIE BOYLE
OMOP PSB �Prof, Health Data Science, The University of Melbourne
dboyle@unimelb.edu.au
MATTHEW FIELD AusCAT �Research Fellow
UNSW Medicine
matthew.field@unsw.edu.au
ANITHA KANNAN SERP �Director Research Platform
Monash University
anitha.kannan@monash.edu
RYAN SULLIVAN
AIS �Product Specialist - Characterization
ryan.sullivan@sydney.edu.au
AMR HASSAN
SDE �Delivery Leader TS and eResearch
Monash University
amr.hassan@monash.edu
Guest Chair
Dr Steven McEachern
Director Australian Data Archive, ANU�steven.mceachern@anu.edu.au
ARDC is enabled by NCRIS.
Panel Discussion - for all our speakers.
48
Questions from the Audience?
49
Thank you!
More Information on Tech Talks page : https://sites.google.com/ardc.edu.au/techtalk2020/talks
ARDC is enabled by NCRIS.