1 of 1

A re-encryption REST service is being developed. To avoid User’s private key being exported, the service runs in the local environment of the User. An extended version of the Galaxy client-site web application is able to reach the endpoint and carry out the transformation.

A new Galaxy data import tool is being developed. The NO-FEGA REST API is being queried using the OIDC token issued by ELIXIR Life Science AAI when login in Galaxy.

Sensitive Data Accessing and Processing in Galaxy

Laia Codó Tarraubella

laia.codo@bsc.es

These projects have received funding from the European Union’s Horizon 2020 research and innovation programme

@ELIXIREurope

/company/elixir-europe

INTRODUCTION

In the context of the ELIXIR Implementation Study ‘Strengthen Data Management in Galaxy’, novel data handling strategies are being explored for enhancing the secure and scalable access and processing of sensitive datasets from Galaxy instances in a secure and scalable manner. To this end, we propose an approach to access, transfer and process (F)EGA Crypt4GH encrypted datasets into a secure environment using a multi-user public Galaxy instance as enactor and controller of the full process.

(1) Spanish National Bioinformatics Institute (INB/ELIXIR-ES); (2) Barcelona Supercomputing Center (BSC), Spain; (3) University of Oslo (UiO), Norway; (4) University of Bergen (UiB), Norway; (5) Italian National Research Council (CNR), Italy; (6) Computational and Data Driven Science, University of Bradford (UoB); (7) Dept Biochemistry and Molecular Biomedicine, University of Barcelona (UB); (8) University of Freiburg, Germany; (9) Data Core, VIB, Belgium

This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101046203 (BY-COVID)

DATASET 1

pubEGAK

privCNK .

symD1

Galaxy

server-site

pubUK

Galaxy

Compute Node/s

pubUK

symD1

HEADER D1

pubCNK

symD1

HEADER D1

pubCNK

symUK

HEADER UK

pubUK

pubUK

pubCNK

symD1

HEADER D1

pubCNK

symUK

HEADER UK

pubUK

pubUK

pubCNK

symD1

HEADER D1

DATASET 1

DATASET 1

DECRYPTION

TOOL

ANALYSIS

TOOL

DATASET 2

DATASET 2

pubUK

symD2

HEADER D2

ENCRYPTION

TOOL

DECRYPTION

TOOL

pubUK

Galaxy

client-site

HEADER D2

pubUK

symD2

privUK .

DATASET 1

DATASET 2

DATASET 1

symD1

Galaxy Public Instance

Keys Server

pubCNK

pubCNK

pubUK

symD2

HEADER D2

DATASET 2

Galaxy pre-task

Galaxy post-task

Data Provider

User Local Environment

HEADER RE-ENCRYPTOR

SERVICE

HEADER RE-ENCRYPTOR

CLIENT

Data import from (F)EGA to Galaxy server

Datasets are downloaded from (F)EGA as crypt4GH files encrypted using User’s key - i.e. only User is able to decrypt them. When stored in Galaxy, the header is split from the encrypted body, which contains the sensitive dataset.

DATASET 2

1

Crypt4GH header re-encryption

To enable an eventual data decryption in the Galaxy compute backend, the original header is send to User’s premises, where it is decrypted and encrypted again, this time using the Compute Node key.

2

Running an analysis tool: input decryption

Before running an analysis tool or workflow, input data is decrypted on the compute node, directly before being processed. Analysis tools remain unaltered.

4

Running an analysis tool: output encryption

Output is encrypted using User’s key, which has been sent from the Galaxy server as an extra crypt4GH file. Results are downloaded and locally decrypted.

5

Galaxy Server as middleware

In the Galaxy instance, data is kept always encrypted, under the key of either the User or the Compute Node. In this way, Galaxy administrator never access User’s data.

3

http://localhost

ENCRYPTION TOOL

DECRYPTION

TOOL

LEGEND

pubUK: User’s SSH public key

privUK: User’s SSH private key

pubCNK: Compute Node SSH public key

privCNK: Compute Node SSH private key

symD1: SSH symmetric key for Dataset 1

symD2: SSH symmetric key for Dataset 2

SymUK: SSH symmetric key for pubUK

1

2

3

4

5

Short-lived keys of the compute Node/s are to be publicly exposed. An automatic system for renewing and provisioning them is required.

An enhanced Galaxy instance could extend the common template launching jobs to Galaxy runners. It would include the instructions on how to encrypt/decrypt data in the job destination.

Laia Codó1,2, Sveinung Gundersen3, Kjell Petersen4, José M. Fernández1,2, María Chavero-Diez1,2, Pável Vázquez3, Abdulrahman Azab3, Marco A. Tangaro5, Krzysztof Poterlowicz6, Frederik Coppens9, Josep Ll. Gelpí 1,2,7, Björn Grüning8, Salvador Capella-Gutierrez1,2

pubUK

symD1

HEADER D1

pubUK

symD1

HEADER D1

HEADER D2

Contact

ELIXIR All Hands 2022, 7-10 June, Amsterdam, Netherlands