1 of 33

NSF Award: 1925001

at Five Years

Cloud Forum

May 2, 2024

Rob Fatland, Oorjit Chowdhary, Naomi Alterman,

Shava Smallen, Mike Norman, Ed Lazowska, Kathy Yellen, Eric Van Dusen, Sarah Stone, Vince Kellen

UW

UCSD

UCB

UCB

UW

UW

UW

UCSD

UCSD

UW

2 of 33

Topics

  • Overview: CloudBank
  • Adoption history
  • CloudBank EOT
  • Looking Ahead

* after a remark on NAIRR

3 of 33

Mike Norman (PI)

Distinguished Prof. UCSD

Ed Lazowska (co-I)

Bill & Melinda Gates Chair UW

Overview

Here we have the rogue’s gallery the CloudBank leadership team

Shava Smallen (co-I)

Project Mgr. / Portal Architect (SDSC)

Vince Kellen (co-I)

CIO UCSD

Kathy Yelick (co-I)

VCR UCB

James Mitchell (partner)

Founder & CEO, Strategic Blue

4 of 33

A Cloud Vision

for Data-Driven Science

Why Research is important and difficult ⇒ cloud addresses obstacles ⇒ democratized access

How We put ourselves in the shoes of a research group

  • Listen
  • Research arc: Where can we contribute? (Example: Exit strategy)

What

  • Culture of team-building: Vendors, admins, students, libraries, …
  • Facilitate access: NSF funding, portal, EOT, help desk, community
  • Build, document, share solutions, consult

5 of 33

What is CloudBank?

  • 5-year pilot project: Facilitate cloud access for computer science researchers and educators
    • Currently in year 5; plan to extend by 6
  • Organized around a powerful portal that provides multi-cloud access ⇒ account management / monitoring
  • Collaboration SDSC, UCSD, UCB, UW
  • FINOPS capability Strategic Blue + UCSD

6 of 33

CloudBank helps NSF CISE researchers leverage public cloud resources by providing

  • access to multiple public clouds
  • account management tools
  • financial operations
  • classroom tools
  • training
  • help desk support
  • savings (no IDC)

User Portal

7 of 33

Challenges and Paradoxes

  • $$$???
  • Building CI for research computing (bespoke, long tail)
  • How to spend $1 on $1 of cloud resources (not $10)

  • Cloud is always the way to go sometimes
  • Cloud solves problems you don’t know you have
  • Get on the cloud… and ideally forget about the cloud
  • The paradox of the cloud provider

  • Research is a complex (long tail) ecosystem
  • Abstractions can help

8 of 33

Directly access cloud consoles

Monitor spend across multiple clouds

Manage access w/ your institution’s user directory

CloudBank Portal

🡺Mimics online banking

9 of 33

10 of 33

Curate Resources and Jargon

CloudBank: ‘Work with us on how to present your stack!’

Vendors: ‘Absolutely!’

11 of 33

Adoption History

12 of 33

History of Adoption

DCL 22-087 published (v2)

Ed Lazowska mass email

PY3: May 2022: DCL 22-087 released enabling PIs with active CISE awards to request cloud funds directly from CloudBank.

PY4: March 2023: Ed Lazowska emailed 5000 PIs to let them know about CloudBank.

Awards

13 of 33

Spend distribution function (mid-2023)

Long tail

14 of 33

15 of 33

Impact on research productivity

16 of 33

Chip Design

Cloud Bursting / Scaling

Quantum Computing

Edge Computing

Network Measurement

Artificial Intelligence

What are CloudBank customers doing?

17 of 33

Three CloudBank Case Studies

  • Connectomics
  • Creating STEM Learning Materials (DHH accessibility)
  • Cancer research

18 of 33

Education Outreach and Training

19 of 33

CloudBank EOT Framework

20 of 33

CloudBank portal links to 101 videos

21 of 33

Ground Up: Teaching Opportunities

  • MSE544: Data science; School of Engineering; UW
      • Hijack this course for three week: Cloud CI-build
      • VMs, images, NoSQL, serverless, containerization
      • Debugging
      • Azure and VS Code

22 of 33

Data System Composition

The Cloud

Us

FUNCTION

HTTP get >

1 Hydrogen 1 1766 Henry Hydrogen…

1 Helium 4 1868 Elaine Helium…

2 Lithium 7 -845 Socrates…

2 Beryllium 9 2150 Dr. Lazarus…

.

.

The Internet

Us-virtual

23 of 33

CloudBank Curriculum

Berkeley Data8 and 2i2c

  • Community Colleges
  • Jupyter Hubs
  • Student authentication
  • Lessons
  • Auto-grading
  • Learn data science

24 of 33

Abstraction Maelstrom

DCL, NAIRR, MSI, NDC-C, …?...

Technical advances from all directions

  • vendor stacks
  • academic projects (Sky Pilot)
  • open community (HuggingFace, GitHub)
  • third party (Marketplace)
  • Research Software Engineering
  • Pure Horsepower (DNN-specific hardware)
  • Curriculum (Data8) and student potential

25 of 33

Sky Pilot

  • Retrograde thinking
  • Sky Computing: A Berkeley view on the future of cloud computing
      • https://sky.cs.berkeley.edu/
  • Conventional Wisdom and Assumptions
    • SPOT is simply $.30 per $1 cheap
    • SPOT is inconvenient / wasteful
    • Checkpointing: Painful
    • All cloud platforms are basically cost-equivalent
    • $ x T = constant
  • This project: With sophomore Oorjit Chowdhary
      • Do the work, validate or modify these assumptions

Dorothy Documents to Demystify - Repro Game

26 of 33

GitHub

Python

Instance Benchmarking

Sky Pilot

“automatically pick the cheapest cloud”

CIFAR-10: ML Task

27 of 33

NAIRR is coming soon!

“Those matrix multiplications are truly creative and insightful!”

-said nobody ever

…meanwhile: what really powers gen-AI…

“The Ascribe Game”

28 of 33

AI usage on CloudBank

227 of 350 awards relate to artificial intelligence

  • Neural
  • Vision
  • Reinforcement
  • NLP
  • DNN
  • LLM

It’s all about GPUs

28

29 of 33

AI Usage by Services

29

  • 64% GPU spend versus 36% CPU
  • Azure OpenAI spend equivalent to VM spend (~$50K)
  • One Quantum award related to AI

30 of 33

  • CloudBank is growing by stages
    • DCL
    • NAIRR
    • MSI
    • NDC-C
    • …and beyond!...
  • Emphasize cost efficiency
    • { security, management, services (‘Trainium’), overhead }
  • Approach cloud as an HPC peer: NSF listens
  • Climb the stack
    • CIFAR-10, ImageNet (GPU), RAG on Llama, …

Looking ahead

31 of 33

Asks

NSF: Expand directorates (BIO, ENG, GEO, …)

Cloud providers: Make it easier to use your cloud (GPUs, egress, other win-win-win)

Administration: Awareness, advocacy, training

Researchers: Buy in to cloud learning

Students: Be a CI-Build rockstar

My team: Dorothy Documents to Demystify

32 of 33

In Conclusion

Identity pitch: Reactive is great but (for me) expands to fill time.

Proactive is (I claim) more interesting and impactful. Make time to have that sort of fun.

33 of 33

Questions? Compliments? Lunch?

(rob5@uw.edu)