1 of 22

The Fight Against Covid-19

Running Folding@Home (and others) on ATLAS and DESY resources

David South, DESY-ATLAS Meeting, 26th June 2020

Illustration of a�SARS-CoV-2 virion

Image: Centers for Disease Control and Prevention

2 of 22

A little biology

2

  • The focus of research in the search for a vaccine to Covid-19 is the SARS-CoV-2 virion itself, specifically the shape and make up of the glycoprotein “spike”, the part that binds to human cells

  • The proteins in a virus are made of long chains of amino acids, which spontaneously “fold” into compact, functional structures, and these shapes then determine the function

  • Calculating the lowest energy conformation (shape) is computationally very expensive with proteins made up of a million atoms. It gets even harder when a protein changes during interaction

  • By running simulations of this folding, the hope is to identify potentially “druggable” protein sites on the virus, where a vaccine may bind to

Mostly from foldingathome.org,�with additional material from Ilija Vukotic

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

3 of 22

Global interest and a genuine will to contribute

3

  • Many volunteer computing�initiatives have focused on supporting COVID-19 research

  • Examples include: Folding@Home, Rosetta@Home, gene@home, BOINC@TACC, Quarantine@Home, OpenPandemics, Corona-AI, ..

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

4 of 22

How to best use ADC resources for Covid-19 research?

  • Plenty of suggestions and ideas coming in, plenty of will from the sites.. and plenty of noise
    • Make sure any contribution is of scientific worth
    • We are not the experts here!

  • Dedicated task force was established at CERN in March, with relevant contacts such as the WHO and the European Bioinformatics Institute

  • Most effective contribution via the volunteer computing initiative Folding@Home
    • Study of diseases:�cancers, neurological, infectious
    • CERN is not new to volunteer computing, the LHC@Home project provides computing resources from more than 200K volunteers

4

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

5 of 22

Integrating the Covid-19 workflow into ADC

  • In general, these are complex calculations where the more cores available to the job the better (→GPUs)
    • However, Folding@Home explicitly includes CPU workloads: ideal for current LHC computing

  • Concept of external production workflows in ADC not really foreseen (as it requires a dedicated transform)
    • The easiest and fastest way was to incorporate it as an analysis type job, employing containerisation

  • Full integration of Covid-19 workflows into ATLAS distributed computing infrastructure
    • FAHClient deployment: Docker image defined on github
    • Authorisation: Using a dedicated VOMS group/production role: /atlas/covid/Role=Production
    • Resource Allocation: Dedicated “COVID” global share, applicable to all ATLAS distributed resources
    • Data Management: New scope group.covid in Rucio
    • Monitoring: New ADC activity “COVID” was deployed in monit-grafana

  • Providing data management expertise for Folding@Home
    • Major consequence of rapid increase in volunteer resources: need to scale the distribution infrastructure
    • HEP framework and tools such as Rucio and FTS to be deployed to expand existing F@H services

5

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

6 of 22

Submitting the first Covid-19 jobs

  • First try out on large and familiar resources, using new dedicated queues
    • First used 4k slots from Tier-0 (10%)
    • Another 4k slots from HLT farm at P1

  • Call to sites on April 7th via ATLAS board of Funding Agency representatives
    • Opt-in policy, initial proposal of 5%�of pledge, later raised to 10%
    • Hugely positive response:

6

Tier-0

HLT farm (P1)

Validated grid sites ramping up

First week of Covid-19 job submission

April 4

April 9

“Please include [our site] to contribute to ATLAS Covid19 effort. I suppose that you will start with a small share but feel free to increase it later”

“The ATLAS Tier-2 at [site] would like to opt-in to provide at least 5% of the resources to the research on COVID-19”

“The funding agency for [country] has approved our request to participate in the ATLAS Folding @Home Covid-19 activity. We can contribute up to 10% of our pledge,..”

“I’d like to give over the [site] to the COVID jobs. I don’t care about the share, it can be 100% in case it’s useful. What is important is that the site is always filled.”

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

7 of 22

Covid-19 F@H jobs on the HLT farm at Point 1

7

  • Rapid increase and steady state of Covid-19 jobs running on the HLT farm at Point 1
    • Began with 4k, then 20k most of last month; increased to 30k after TDAQ week (approx ⅓ of total)

20k

30k

95k

4k

Covid-19

ATLAS Full Simulation

ATLAS Fast Simulation

TDAQ�Week

CPU slots of running jobs on the HLT farm P1

April 4

May 20

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

8 of 22

Covid-19 F@H jobs on all ATLAS CPU resources

  • Since the start of this month, we have stable running with a total of 60k slots
    • Flat 30k from the unpledged HLT at P1
    • Another 30k distributed between the 55 contributing grid sites, representing 10% of their pledge

  • Sometimes more than 60k: Covid jobs take up any spare slots in Analysis global share

8

Increase to 10% gshare for sites

5% gshare for sites

CPU slots of all running jobs in COVID gshare

HLT farm (P1)

Tier-0

April 4

June 24

TDAQ�Week

Extra quota from analysis share

TDAQ�Week

60k

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

9 of 22

Distribution of completed Covid-19 F@H jobs on CPUs

  • 1.283M Folding@Home jobs completed as of yesterday

9

CERN (P1)

USA

Germany

France

UK

Italy

Canada

Nordic

Spain

Netherlands

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

10 of 22

Covid-19 F@H jobs on ATLAS GPU resources

  • Covid-19 F@H jobs also running�on the limited number of GPU�resources available to ATLAS
    • Actively submitting centrally to�six sites, including BNL, INFN,�Manchester, MWT2, QMUL
    • Not all GPUs are the same:�not all entries here are equal!

  • May 2020: Idea to use NAF GPUs�as well, to improve DESY visibility
    • Effort by several people to�make this happen, both�from ADC and DESY-IT

  • Employs up to 10 GPUs in the NAF, centrally submitted by ATLAS but is in fact an “All DESY” (also Uni-HH) contribution to Folding@Home
    • DESY-HH is now the biggest GPU contributor managed by ADC

10

ATLAS Folding@Home jobs running on GPUs

35

DESY-HH

Manchester, UK

Brookhaven, USA

Midwest-T2, USA

Bologna, Italy

Queen Mary, UK

DESY, Germany�(includes CMS/Uni-HH, IT)

Running F@H jobs in NAF monitoring

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

11 of 22

Folding@Home contributions: ATLAS

  • ATLAS and CMS are the main contributors to the Folding@Home “CERN & LHC Computing” team
    • All ATLAS CPU queues under one name ATLAS_CPU

  • GPU contributions are better rewarded by F@H in terms of credits per “work unit”, accounted separately per queue

  • Great diversity in contributors, ranging from large and small experiments to private PCs/users: Team effort

11

21st in the all time F@H ranking

ATLAS daily�points production

Both ATLAS and CMS have processed > 3M work units

(Top 20)

Over 11.5M work units processed

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

12 of 22

Folding@Home contributions: DESY

  • Looking in terms of DESY contributions, we have a strong presence in the Top 20 of the “CERN & LHC Computing” team

    • CMS-Experiment: 100% HLT + WLCG resources (up to 5% of pledge), centrally submitted by CMS. Mainly CPU

    • ATLAS_CPUs: 30% HLT + WLCG resources (around�10% of pledge), centrally submitted by ATLAS. All CPU

    • DESY-ZN_GPU: 104 GPUs, Gridengine Farm in Zeuthen, opportunistic (Götz Waschk). Closing in on ALICE/LHCb!

    • DESY-HH_GPU: Up to 10 GPUs from NAF, opportunistic, centrally submitted by ATLAS (1st of 6 “ADC GPU” donors)

..and just outside the Top 40:

    • DLAB_DESYZ: CPUs/GPUs in Zeuthen DLAB (Ingo Bloch)

12

21st in the all time F@H ranking

Over 11.5M work units processed

Both ATLAS and CMS have processed > 3M work units

Actually DESY-ZN_GPU ;)

(Top 20)

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

13 of 22

Increasing the visibility of the (FH) computing efforts

  • There is a DESY Corona Research Webpage, where the lab is (rightly) focussing on the direct research going on at PETRA III
    • X-ray screening of the virus itself
    • Looking for drugs that bind to the molecule
    • Fast track access for Corona-related projects

13

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

14 of 22

Increasing the visibility of the (FH) computing efforts

  • There is a DESY Corona Research Webpage, where the lab is (rightly) focussing on the direct research going on at PETRA III
    • X-ray screening of the virus itself
    • Looking for drugs that bind to the molecule
    • Fast track access for Corona-related projects

  • A little further down you’ll find an article about the FH computing efforts using not only Folding@Home but also Rosetta@Home, which is running on:
    • HH: 500 out of warranty grid worker nodes
    • ZN: Zeuthen HPC farm
    • Maxwell cluster: Hamburg HPC farm

14

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

15 of 22

Increasing the visibility of the (FH) computing efforts

  • There is a DESY Corona Research Webpage, where the lab is (rightly) focussing on the direct research going on at PETRA III
    • X-ray screening of the virus itself
    • Looking for drugs that bind to the molecule
    • Fast track access for Corona-related projects

  • A little further down you’ll find an article about the FH computing efforts using not only Folding@Home but also Rosetta@Home, which is running on:
    • HH: 500 out of warranty grid worker nodes
    • ZN: Zeuthen HPC farm
    • Maxwell cluster: Hamburg HPC farm

  • A counter has been developed, currently hosted on IT page, combining these efforts
    • We try to get this on the DESY Corona page

15

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

16 of 22

Summary

  • By running Folding@Home simulations, which are fully integrated into the ADC infrastructure, ATLAS continues to make significant contributions to the fight against Covid-19
    • Implementation has also proven useful in terms of experience with containerised workflows and GPUs, including the adaptations needed to get local resources (NAF) plugged into the ATLAS distributed computing

  • Need to consider how to wind down this activity at some point, most likely as MC need increases again

  • We try to improve the visibility of such Covid-19 research efforts of the (FH) DESY computing groups

16

Group Production of DAODs

Monte Carlo Full Simulation

Monte Carlo Reconstruction

User Analysis

Event Generation

Covid-19

Data

MC Fast Sim

Group Analysis

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

17 of 22

Back Up

17

D. South. Fighting Covid-19 with F@H (and others) on ATLAS and DESY resources. 26th June 2020

18 of 22

Opportunistic resources: The high level trigger farm

  • Use of HLT for ATLAS simulation jobs when not needed by TDAQ
    • This is a huge resource, since 2019 up to 95k slots available
    • Essentially only used for simulation jobs: Stable release and small inputs

  • Significant contribution: 24% of ATLAS simulation in 2019 on P1 (6/24 Billion events)

18

95k slots

Cooling and major network infrastructure work

59k slots

ATLAS simulated events 2019

2017 data taking

2018 data taking

GRID

Sim@P1

Jan 2017

May 2020

D. South. ADC Summary, DESY-ATLAS Meeting, 5th June 2020

19 of 22

CMS F@H

  • CMS different strategy, mainly running on HLT (60k cores)
    • See presentation by A. Perez-Calero session in May GDB

  • Grid sites contributing an additional 5k, incl. DESY-HH

19

May 6

April 14

F@H

10k All CMS jobs� at DESY-HH

1k Folding@Home only

April 16

June 9

5k

CMS Folding@Home on grid

D. South et al. COVID@DESY, NUC, 11th June 2020

20 of 22

DLAB in Zeuthen

  • DLAB in Zeuthen is running F@H on a mixture of CPUs and GPUs

  • Some details:
    • 2 x Xeon E5-2643 0 @ 3.3 GHz (just CPU)
      • 2 x 8 threads
    • 1 x Xeon X5650 @ 2.67 GHz
      • 11 threads on 6 CPU cores
      • 1x NVidia Quadro K2200
    • 1 x Xeon X5650 @ 2.67 GHz
      • 23 threads on 12 CPU cores
      • 1x NVidia Quadro K2200
    • 1 x i5-8600T @ 2.3 GHz
      • 6 threads, just CPU

20

D. South et al. COVID@DESY, NUC, 11th June 2020

21 of 22

DESY Zeuthen GPUs

  • A considerable number of GPUs are running Folding@Home jobs in Zeuthen on the GridEngine farm with a low share

  • Total of 37 nodes of several generations with 2 to 8 GPUs, in total: 104 GPUs

  • Huge impact compared to others for an individual donor

21

D. South et al. COVID@DESY, NUC, 11th June 2020

22 of 22

Rosetta@Home contributions

22

  • At DESY-HH:
    • Running protein folding simulations via BOINC
    • Build Singularity container on CVMFS for easier deployment
      • Dedicated out-of-warranty nodes with ~500 cores to Rosetta@home
      • DESY significant contributor, provided ~35000 CPUh since April

e.g., utilization of batch0106

“HEPGridVolunteerDE” team 74th in May

  • At DESY-ZN: HPC farm contributing under donor name “gw666”

  • Maxwell cluster: Rosetta@Home jobs also running there

D. South et al. COVID@DESY, NUC, 11th June 2020