A re-encryption REST service is being developed. To avoid User’s private key being exported, the service runs in the local environment of the User. An extended version of the Galaxy client-site web application is able to reach the endpoint and carry out the transformation.
A new Galaxy data import tool is being developed. The NO-FEGA REST API is being queried using the OIDC token issued by ELIXIR Life Science AAI when login in Galaxy.
Sensitive Data Accessing and Processing in Galaxy
Laia Codó Tarraubella
laia.codo@bsc.es
These projects have received funding from the European Union’s Horizon 2020 research and innovation programme
@ELIXIREurope
/company/elixir-europe
INTRODUCTION
In the context of the ELIXIR Implementation Study ‘Strengthen Data Management in Galaxy’, novel data handling strategies are being explored for enhancing the secure and scalable access and processing of sensitive datasets from Galaxy instances in a secure and scalable manner. To this end, we propose an approach to access, transfer and process (F)EGA Crypt4GH encrypted datasets into a secure environment using a multi-user public Galaxy instance as enactor and controller of the full process.
(1) Spanish National Bioinformatics Institute (INB/ELIXIR-ES); (2) Barcelona Supercomputing Center (BSC), Spain; (3) University of Oslo (UiO), Norway; (4) University of Bergen (UiB), Norway; (5) Italian National Research Council (CNR), Italy; (6) Computational and Data Driven Science, University of Bradford (UoB); (7) Dept Biochemistry and Molecular Biomedicine, University of Barcelona (UB); (8) University of Freiburg, Germany; (9) Data Core, VIB, Belgium
This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101046203 (BY-COVID)
DATASET 1
pubEGAK
privCNK .
symD1
Galaxy
server-site
pubUK
Galaxy
Compute Node/s
pubUK
symD1
HEADER D1
pubCNK
symD1
HEADER D1
pubCNK
symUK
HEADER UK
pubUK
pubUK
pubCNK
symD1
HEADER D1
pubCNK
symUK
HEADER UK
pubUK
pubUK
pubCNK
symD1
HEADER D1
DATASET 1
DATASET 1
DECRYPTION
TOOL
ANALYSIS
TOOL
DATASET 2
DATASET 2
pubUK
symD2
HEADER D2
ENCRYPTION
TOOL
DECRYPTION
TOOL
pubUK
Galaxy
client-site
HEADER D2
pubUK
symD2
privUK .
DATASET 1
DATASET 2
DATASET 1
symD1
Galaxy Public Instance
Keys Server
pubCNK
pubCNK
pubUK
symD2
HEADER D2
DATASET 2
Galaxy pre-task
Galaxy post-task
Data Provider
User Local Environment
HEADER RE-ENCRYPTOR
SERVICE
HEADER RE-ENCRYPTOR
CLIENT
Data import from (F)EGA to Galaxy server
Datasets are downloaded from (F)EGA as crypt4GH files encrypted using User’s key - i.e. only User is able to decrypt them. When stored in Galaxy, the header is split from the encrypted body, which contains the sensitive dataset.
DATASET 2
1
Crypt4GH header re-encryption
To enable an eventual data decryption in the Galaxy compute backend, the original header is send to User’s premises, where it is decrypted and encrypted again, this time using the Compute Node key.
2
Running an analysis tool: input decryption
Before running an analysis tool or workflow, input data is decrypted on the compute node, directly before being processed. Analysis tools remain unaltered.
4
Running an analysis tool: output encryption
Output is encrypted using User’s key, which has been sent from the Galaxy server as an extra crypt4GH file. Results are downloaded and locally decrypted.
5
Galaxy Server as middleware
In the Galaxy instance, data is kept always encrypted, under the key of either the User or the Compute Node. In this way, Galaxy administrator never access User’s data.
3
http://localhost
ENCRYPTION TOOL
DECRYPTION
TOOL
LEGEND
pubUK: User’s SSH public key
privUK: User’s SSH private key
pubCNK: Compute Node SSH public key
privCNK: Compute Node SSH private key
symD1: SSH symmetric key for Dataset 1
symD2: SSH symmetric key for Dataset 2
SymUK: SSH symmetric key for pubUK
1
2
3
4
5
Short-lived keys of the compute Node/s are to be publicly exposed. An automatic system for renewing and provisioning them is required.
An enhanced Galaxy instance could extend the common template launching jobs to Galaxy runners. It would include the instructions on how to encrypt/decrypt data in the job destination.
Laia Codó1,2, Sveinung Gundersen3, Kjell Petersen4, José M. Fernández1,2, María Chavero-Diez1,2, Pável Vázquez3, Abdulrahman Azab3, Marco A. Tangaro5, Krzysztof Poterlowicz6, Frederik Coppens9, Josep Ll. Gelpí 1,2,7, Björn Grüning8, Salvador Capella-Gutierrez1,2
pubUK
symD1
HEADER D1
pubUK
symD1
HEADER D1
HEADER D2
Contact
ELIXIR All Hands 2022, 7-10 June, Amsterdam, Netherlands