1 of 33

Performing Large Scale Parameter Surveys with OSG Services

Brian Bockelman, Christina Koch

Chi-Kwan Chan

Rob Quick

1

PIRE webinar: OSG Services and EHT - 9/8/22

2 of 33

OSG High Throughput Services

2

PIRE webinar: OSG Services and EHT - 9/8/22

3 of 33

The OSG Consortium

Established in 2005, the OSG Consortium operates a fabric of distributed High Throughput Computing (dHTC) services in support of the National Science & Engineering community

3

Contributions come from many members but support is primarily from:

Which are major NSF / OAC cyberinfrastructure investments.

Who is OSG?

PIRE webinar: OSG Services and EHT - 9/8/22

4 of 33

OSG Services

Open Science Compute Federation (OSCF)

  • Services that integrate clusters into larger resource pools.
  • These resource pools operate as a single virtual or “overlay” batch system.

Open Science Data Federation (OSDF)

  • Services that deliver input files to jobs and can store output.
  • Consists of origins (where files are safely stored) and caches (which keep file replicas close to compute capacity).

Open Science Pool (OSPool)

  • A resource pool consisting of opportunistic or donated resources.
  • Open to any federally-funded US researcher and their collaborators.

And Many More…

  • An Access Point where users can place their jobs.
  • Manages the workload and connects jobs to compute capacity.

4

PIRE webinar: OSG Services and EHT - 9/8/22

5 of 33

5

Who Participates in the OSG Services?

The OSCF and OSDF:

  • Integrates over >120 universities and laboratories across the planet, about half outside the US.
  • Organizes the capacity to resource pools based on local policy,
  • Moves >90M files weekly,
  • Runs >2M jobs weekly,
  • Manages jobs from more than a dozen Access Points.

PIRE webinar: OSG Services and EHT - 9/8/22

6 of 33

The Access Point

A core concept is the Access Point, a service to place and manage your workloads & files.

  • The AP handles your jobs,
  • Moves inputs and outputs to the jobs,
  • Connects to different resource pools or allocations.

This is your “home” on the OSG!

6

At the heart is the HTCondor submit service - but also has:

  • File movement and management.
  • User/group management.
  • Unix account management.
  • Integration with resource provisioning

PIRE webinar: OSG Services and EHT - 9/8/22

7 of 33

Capacity: OSPool and the PATh Facility

To be productive, the AP needs to connect to some capacity!

  • The OSPool is an HTCondor pool consisting of over 70 compute resources; each research project gets a “fair share” of the current contributions.
  • The PATh Facility is a new, NSF-funded distributed resource that is optimized for HTC workflows. Access is based on “credits” granted by NSF.
  • Most ACCESS (formerly, XSEDE) resource providers can also be accessed from an AP.

7

AP

OSPool

PATh Facility

ACCESS RP

PIRE webinar: OSG Services and EHT - 9/8/22

8 of 33

8

OSPool Sites: CC*-funded Compute

  • Our current record is 74 universities contributing to the OSPool.
  • An important subset is the ~30 sites that have received compute clusters through the NSF CC* program.

PIRE webinar: OSG Services and EHT - 9/8/22

9 of 33

Using OSG Connect

OSG Connect is the OSG operated Access Point(s) for placing and managing workloads. By default, computational tasks placed here will run on the OSPool.

(Can use additional capacity from the access point, talk to us for more details!)

Accessible using ssh log in (command line or tools like PuTTy).

Workloads are placed and managed via the HTCondor job scheduler.

Access points also include space for data used in jobs, and different mechanisms for accessing data in jobs.

9

PIRE webinar: OSG Services and EHT - 9/8/22

10 of 33

Using HTCondor

HTCondor is the job scheduler used on OSG services.

Jobs are described using an HTCondor submit file.

The submit file includes the following job components:

  • A main “executable” (script or program to execute)
  • Needed files: input data, software packages, additional scripts
  • Job resource requests and requirements
  • How many jobs to submit in the workload
  • Any additional job options

Use HTCondor commands to submit and monitor the workload.

10

PIRE webinar: OSG Services and EHT - 9/8/22

11 of 33

Making OSG services work for you

Do you have a large workload of tasks to run?

→ Use OSG Connect Access Point(s) (command line/ssh) to place and manage your workload, using the computing capacity of the OSPool

Do you want to use a web interface to run ipole calculations?

→ Use Gateway service (uses OSPool behind the scenes)

Do you have allocations on other resources?

→ Use HTCondor’s “annex” feature from an OSG Connect Access Point.

11

PIRE webinar: OSG Services and EHT - 9/8/22

12 of 33

What’s Possible

In the past year, EHT has run 3.25 million jobs (using 1.9 million compute hours).

Many of these happened in a 2 month burst of computing last fall →

For the rest of this webinar, we want to show you two ways to compute at this scale.

12

PIRE webinar: OSG Services and EHT - 9/8/22

13 of 33

EHT Use Cases and Ipole Demonstration (Shell)

13

PIRE webinar: OSG Services and EHT - 9/8/22

14 of 33

14

PIRE webinar: OSG Services and EHT - 9/8/22

15 of 33

15

PIRE webinar: OSG Services and EHT - 9/8/22

16 of 33

EHT Use Cases

16

PIRE webinar: OSG Services and EHT - 9/8/22

17 of 33

What is ipole?

  • Polarized covariant radiative transfer in C, for imaging black hole accretion systems such as those being imaged by the Event Horizon Telescope
  • Mościbrodzka & Gammie (2018); PATOKA pipeline (Wong et al. 2022)

17

+

=

PIRE webinar: OSG Services and EHT - 9/8/22

18 of 33

18

Visual: Ben Prather; image library: EHT Theory WG/Chi-kwan Chan

PIRE webinar: OSG Services and EHT - 9/8/22

19 of 33

ipole Setup

  • Code:
    • https://github.com/AFD-Illinois/ipole
    • Compile with static linkage for easy distribution (otherwise, use Docker/Singularity→Apptainer)
  • Input:
    • GRMHD simulation snapshots
    • ipole “parameter files”
  • Output:
    • Black hole images in HDF5

19

PIRE webinar: OSG Services and EHT - 9/8/22

20 of 33

bhpire ipole “workflow” with OSG

  • https://github.com/bhpire/ipole-osg
  • A directory structure to stage input data, output images, and log files
  • A wrapper script to turn command line arguments into ipole parameter files
  • A HTCondor submission script to submit a single job
  • Custom scripts to chunk large runs into small sizes
  • Addition tools to checksums, push results to additional server, etc

20

PIRE webinar: OSG Services and EHT - 9/8/22

21 of 33

Live Demo

21

PIRE webinar: OSG Services and EHT - 9/8/22

22 of 33

The EHT Gateway - Background and Demo

22

PIRE webinar: OSG Services and EHT - 9/8/22

23 of 33

What is a Science Gateway?

  • A user interface to cyberinfrastructure resources
  • Abstracts the complexity of the underlying CI
  • Designed with UX at forefront
  • Built in visualization components
  • Look and feel of the project (not the CI)

23

PIRE webinar: OSG Services and EHT - 9/8/22

24 of 33

24

PIRE webinar: OSG Services and EHT - 9/8/22

25 of 33

Now for a Demo

Proof-of-concept gateway is at https://eht.scigap.org

25

PIRE webinar: OSG Services and EHT - 9/8/22

26 of 33

High-Level Architecture Diagram

26

PIRE webinar: OSG Services and EHT - 9/8/22

27 of 33

Gateway Summary

  • The EHT Gateway provides a web-based interface to OSPool resources
  • Built on the Apache Airavata framework which is highly customizable
    • Built in User, Group, Application, and Resource (compute and storage) Management
    • Along with monitoring tools and security
  • The iPole application was used as a proof-of-concept
    • We would like to hear from the community what other additions and refinements could be added to accomplish EHT research goals
    • Support scalability for additional applications will be enhanced with containers
  • We are happy to do targeted follow up events for users, administrators, and developers

27

PIRE webinar: OSG Services and EHT - 9/8/22

28 of 33

Links to Documentation

28

PIRE webinar: OSG Services and EHT - 9/8/22

29 of 33

Next Steps

29

PIRE webinar: OSG Services and EHT - 9/8/22

30 of 33

Available Entry Points

Use OSG Connect Access Points:

  • Used in CK’s demo, typical ssh log in + job submission with HTCondor
  • Default access to OSPool, can connect to other capacity
  • Can be used for ipole calculations, or other high throughput computing

Use EHT Gateway

  • Used in Rob’s demo, web interface to run workloads
  • Uses OSPool resources
  • Currently runs simple ipole jobs
  • Is customizable with additional applications and resources

30

PIRE webinar: OSG Services and EHT - 9/8/22

31 of 33

Getting an Account

OSG Connect Access Point:

  • Apply for an account here: https://www.osgconnect.net/
  • You will be contacted to schedule a consultation.
  • Your account will be approved and you can begin placing workloads.
  • See today’s demo as a starting point! https://github.com/bhpire/ipole-osg

Use EHT Gateway

  • Still in proof-of-concept mode. Contact CIRC (circ-iu-group@iu.edu) if interested in being a test user or providing workflow descriptions.

31

PIRE webinar: OSG Services and EHT - 9/8/22

32 of 33

Getting Help

For support on OSG Connect:

Contact: support@osg-htc.org

User guides: https://portal.osg-htc.org/documentation/

For support on EHT Gateway:

Contact: circ-iu-group@iu.edu

Gateways for Users: https://userdocs.airavata.org/en/latest/

32

PIRE webinar: OSG Services and EHT - 9/8/22

33 of 33

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Nos. 2030508, 1548562, 1339774, 1840003, and 1743747. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

33

PIRE webinar: OSG Services and EHT - 9/8/22