1 of 12

News from the Open Science Data Federation��Frank Würthwein�Director, San Diego Supercomputer Center��October 11th 2022

2 of 12

Reminder of Vision

3 of 12

3

Democratize Access

4 of 12

Long Term Vision

  • Create an Open National Cyberinfrastructure that allows the federation of CI at all ~4,000 accredited, degree granting higher education institutions, non-profit research institutions, and national laboratories.
      • Open Science
      • Open Data
      • Open Source
      • Open Infrastructure

4

Openness for an Open Society

Open Compute

Open Storage & CDN

Open devices/instruments/IoT, …?

5 of 12

Community vs Funded Projects

5

Community with

Shared Vision

Lot’s of funded projects that

contribute to this shared

vision in different ways.

Community to …

… grow OSDF.

… build on OSDF.

OSDF is “owned” and “built” by the community for the community

6 of 12

Origins and Caches

  • Origins are places that own data.
  • Caches are places via which data is accessed.
  • By having a network of caches that spans the entire USA, we can provide access to any data that lives in any origin from anywhere.
    • Any Data, Anytime, Anywhere

6

7 of 12

New Since Last Year

  • NSF funded Prototype National Research Platform (PNRP)
    • Hardware deployment includes 8 new caches to achieve coverage across continental USA at 500 miles radius.
  • 9th & 10th Caches in Miami & Hawaii to increase US coverage
  • ESnet adding caches in Amsterdam & London POPs
    • Dune is the most prominent use case.
  • NSF CC* program awarded storage systems at 9 campuses (~ 5PB per campus).
    • Expect most will integrate into OSDF.
  • NSF added ~30,000 cores PATh facility & credit program.
  • NRP adds 500 new GPUs (A10, 3080, 3090, A100)
    • This doubles the # of GPUs available
    • Of the 1,000+ GPUs on NRP, 1/3 was funded via PNRP project

7

Data Federation is independent of OSG Compute Federation

OSDF forms data infrastructure for both OSG & NRP

8 of 12

Data Infrastructure Model of NRP

  • Support regional Ceph storage systems across the USA.
    • Campuses can join individual storage hosts to the Ceph system in their region.
    • All regional storage systems are Origins in OSG Data Federation (OSDF)
    • Deploy replication system such that researchers can decide what part of their namespace should be in which regional storage.
  • Deploy caches in Internet2 backbone such that no campus nationwide is more than 500 miles from a cache.

8

NRP data infrastructure model combines best of PRP & OSG

From PRP we take the regional Ceph storage concept

From OSG we take the data origin & caching concepts

And then we add as a totally new feature:

User controlled replication of partial namespaces across regions.

(We will develop this during 3 year “testbed” phase)

Want Others to build higher level data services on top

9 of 12

Proposal for NSDF effort at UCSD in Y2

  • Would like to go in 2 directions:
    • Work with Material Science community on trying out OSDF in combination with the PATh Facility.
      • We talked about this at last year’s retreat and said we’d start this when the PATh facility is deployed.
      • This is now done.
    • Integrate higher level data services on existing OSDF
      • Bring FAIR principles to OSDF
        • Decision point between producing software from scratch vs integration of community supported software.
          • Should we spend the effort to explore Dataverse given that it seems state of the art upper level software framework.

9

10 of 12

From Dataverse website

10

11 of 12

Summary & Conclusions

  • The Open Science Data Federation (OSDF) is seeing rapid growth.
    • Adding 12 new caches and potentially 9 new origins
  • NSF asked that PATh support the OSDF independent of the Open Science Compute Federation (OSCF)
    • i.e. the data federation in OSG shall be usable from outside the compute federation of OSG as a standalone entity.
    • NRP is adopting the OSDF as part of its “Data Infrastructure Model”
    • One of the CC* storage awardees wants to use OSDF from NRP instead of OSG.
  • Need to make decisions about where UCSD effort in NSDF should focus in Y2.
    • Would like to follow up on plans from last year retreat.
    • Would like to grow FAIR scope, some of which we already had in Y1.

11

12 of 12

Comments & Questions ?