1 of 28

SkyCamp 2024

Ion Stoica

October 23, 2024

2 of 28

Sky: The problem

App A

Cloud Services

App B

Cloud Services

App C

Cloud Services

Today’s Cloud

Fragmented infrastructure

1st Sky Retreat, January, 2022

vs

3 of 28

Sky: previous solutions

Portability Layer

Hides differences between clouds

(e.g., Grid Computing, Anthos, Azure ARC)

App A

Cloud Services

App B

Cloud Services

App C

Cloud Services

Today’s Cloud

Fragmented infrastructure

App A

Cloud Services

App B

Cloud Services

App C

Cloud Services

Portability Layer

Too low level, too complex, no incentives

4 of 28

Sky: our solution

App A

Cloud Services

App B

Cloud Services

App C

Cloud Services

Today’s Cloud

Fragmented infrastructure

App A

Cloud Services

App B

Cloud Services

App C

Cloud Services

Market for cloud services

Leverage similar or identical services running on multiple clouds

Examples of services:

  • Compute (VMs, hosted k8s)
  • Storage (S3, Azure blob store, GCP cloud storage)
  • Hosted Spark (EMR, DataProc, Databricks)
  • AI endpoints (OpenAI, Anthropic, Fireworks, Together, …)

5 of 28

Creating the Market: Intercloud Broker

5

Users: submit job requests specifying

  • Code and required resources/services
  • Desired criteria (performance, cost, location)

App 1

App 2

App 3

Intercloud brokers

Cloud Services

Cloud Services

Cloud Services

6 of 28

Intercloud Broker

6

Broker

  • Collects information about what services clouds offer

App 1

App 2

App 3

Intercloud brokers

Cloud Services

Cloud Services

Cloud Services

7 of 28

Intercloud Broker

7

Intercloud brokers

Intercloud brokers

Broker

  • Places jobs on appropriate cloud(s) based on requirements
  • Oversees computation and restarts upon failures, etc.

App 1

App 2

App 3

Intercloud brokers

Cloud Services

Cloud Services

Cloud Services

8 of 28

The age of GenAI

8

January, 2022

August, 2022

November, 2022

March, 2023

May, 2024

GPT-3.5

GPT-4

GPT-4o

LLAMA

LLAMA 2

LLAMA 3

September, 2024

GPT-o1

LLAMA 3.2

9 of 28

9

Double down on AI workloads and applications

10 of 28

Research: two tracks

Tame heterogeneity:

  • Abstract away differences between similar services

Exploit heterogeneity

  • Reduce cost, improve performance, availability, etc

10

11 of 28

Sky Projects

LOTUS

11

Clouds

Compatibility �set

SkyStorage

⚔️ Chatbot Arena ⚔️

MemGPT

Apps

Skydentity

Intercloud Broker

SkyServe

Hosted LLMs

(OpenAI compat. APIs)

Sky projects

3rd party

R2E

Compass

12 of 28

Auto-recovered preemptions

Managed Batch Jobs

AI Model Serving

13 of 28

vLLM Update

Most popular open-source inference engine for LLMs

20K→28.9K GitHub Stars

360→600+ Contributors

540K+ PyPI downloads for last 30 days

  • Native prefix caching
  • TPU backend support
  • Production quality engine
  • Gaudi & Intel CPU/GPU
  • MI300x MLPerf Submission
  • Resources & Benchmarking

13

Project Stats

(from June)

Sponsors

More contributors

. . .

14 of 28

The DSPy project - dspy.ai

Let’s program—not prompt—LMs.��Connect declarative modules into a computation graph, and compile it into a chain of optimized prompts (or LM finetunes) automatically.

How? Optimizers and assertions

15 of 28

Chatbot Arena LLM Leaderboard (Apr 2023 - Now)

Since launch (Apr 2023 - Now)

150+ Frontier models

800K Unique monthly users

2 million Votes

70 million User queries

ranking

Supports coding, math, vision, multilingual, ...

Chatbot Arena

  • Crowdsourced AI benchmarking
  • Simple side-by-side UI
  • Dynamic and real-world interactions

16 of 28

R2E – Turning any Repository to Programming Agent Environment

17 of 28

Compass:

Encrypted Semantic Search with High Accuracy

17

User

Retriever

Knowledge DB

Query

Ranked list of document ids

https://eprint.iacr.org/2024/1255

Jinhao Zhu, Liana Patel, Matei Zaharia, Raluca Ada Popa

18 of 28

LOTUS

18

LOTUS Execution Engine

Semantic Operator Programming Model

LLM-Powered Query Engine for Processing Unstructured & Structured Data

Up to 400x faster execution time with statistical accuracy guarantees

19 of 28

GoEx: Execution Engine

Retrieval Aware Training (RAT)

Measure

Hallucination!

gorilla.cs.berkeley.edu

{} Open Functions

Berkeley Function-Calling Leaderboard

Agent-Arena

20 of 28

Technical Collaboration and �Pull Requests into Gorilla from:

Used at:

21 of 28

Letta - agents serving framework

Deploy “agents-as-a-service” with REST API interface (backed by FastAPI + Postgres server)

Design, debug, and deploy agents in the ADE (Agent Development Environment)

Long-term, highly personalized memory with techniques based on MemGPT

Service

Application

REST API

Organizations

Agents

Tools

Data Sources

user_id: …

agent_id: …

LLMs

22 of 28

Agenda

10:15am-11:00pm SkyPilot

11:00-11:15am Break

11:15am-12:00pm DSPy

12:00am-12:30pm ChatBot Arena

12:30-1:30pm Lunch

1:30-2:00pm R2E

2:00-2:30pm LOTUS

2:30-2:45pm Break

2:45-2:15pm Compass

2:15-4pm Gorilla

4:00-4:15pm Break

4:15–4:45pm: vLLM

4:45–5:30pm: memGPT (Letta)

22

23 of 28

Our wonderful staff is ready to help

Kattt Atchley

Jon Kuroda

Ivan Ortega

Kailee Truong

24 of 28

We would like to thank our sponsors!

24

25 of 28

Thanks!

25

26 of 28

Sky Camp 2024

Closing Remarks

Joey Gonzalez

October 23, 2024

27 of 28

We would like to thank our sponsors!

We look forward to seeing you at the �2025 Winter Retreat�January 13th (Mon) - 15th (Wed) �at Hyatt in Monterey!

27

28 of 28