1 of 19

Systems Primer

Bits to disks to clouds

Why? ‘Full stack’ engineer?

2 of 19

Key Questions

  1. Why a rich variety of DBs?
  2. How to eval DBs for my data apps?

Data Size

Hardware/ Storage

Algorithms

Languages + Libraries

‘Tiny’

(< 10 GBs)

Store in RAM

‘Usual’ CS Algorithms (sort, hash, ..)

Pandas (Python) + SQL

(Spreadsheets for < 100 MBs)

‘Small’

(10GB - 10TB)

‘Big’

(> 10 TBs)

Goal: Gain intuition – What are Speed and Cost tradeoffs?

3 of 19

4 of 19

Basics

Basics

    • INT32: 4 bytes, INT64: 8 bytes
    • 2^10 = 1024, 2^20 ~= 1Million, 2^30 ~= 1 Billion (10^9), 2^40 ~= 1 Trillion (10^12)
    • To store int32 records: 1 Million records = 4MB, 1 Billion records = 4 GB
    • [Often use 1000 vs 1024, for quick approximations]

RAM – SSD/HDD Read/Writes

(aka ‘Paging’)

CPU – RAM Read/Writes

Bytes in RAM, SSDs, HDDs

5 of 19

IO

Systems

D-RAM

S-RAM

Hard Drives (HDD)

Flash/SSD/

Non-volatile mem (nvm)

Tape

L1, L2

Cache

MBs, 20 ns

GBs, 50-100 ns

1-10 TBs,

~10s microsecs

10s of TBs-PBs, 10s of msecs

Cost/bit

Speed

Volatile -- data lost when you turn off power

Non-Volatile

6 of 19

IO Blocks

For Efficiency

E.g., to store 1GB data, we partition the data, and store in IO Blocks

Data

(1 GB)

IO Blocks

‘Partition’ into IO blocks

7 of 19

IO Blocks

For Efficiency

Key Concepts

  1. Data is stored in Blocks (aka “partitions”)
  2. Sequential IO is 10x-100x+ faster than ‘many’ random IO
    • E.g., 1 MB (located sequentially) versus 1 Million bytes in random locations
  3. HDDs/SSDs copy sequential ‘big’ blocks of bytes to/from RAM

1 Package

1000 Packages

(1 Trailer)

3000 Packages

(3 Trailers)

‘M’ Trucks

1 Byte

64MB-Blocks

(64MBs of sequential, aka contiguous bytes

Nx64MB-Blocks

(‘N’ sequential 64MB-Blocks)

Set of ‘M’

Nx64MB-Blocks

8 of 19

IO Blocks

For Efficiency

E.g., to store 1GB data, we need 15.625 64MB-Blocks (= 1GB/64MB) in

  • Sixteen 64MB-Blocks, (1x64MB = 64MB) OR
  • Eight 2x64MB Blocks, (i.e, M = 8, N = 2), OR
  • Four 4x64MB Blocks, OR Two 8x64MB Blocks, OR One 16x64B Block
  • (OR some combo)

2x64MB

4x64MB

Data

(1 GB)

IO Blocks

9 of 19

Read,

Write

IO Blocks

Data stored in ‘M’

Nx64MB Blocks

// Read M Nx64MB-Blocks for Data

ReadData(M) For i:1…M Blocks

    • ReadBlocks(startLocation[ith Block], N)

Read/Write Data into RAM

// Read N contiguous Blocks from startLocation

ReadBlocks(startLocation, N Blocks)

  1. Access Block’s startLocation in IO device
  2. Scan N Blocks in one efficient operation. That is
    • Read N*64MB (or multiple) contiguous bytes to RAM

WriteBlocks is similar to ReadBlocks, except

2b. Write N*64MB (or multiple) contiguous bytes from RAM

10 of 19

Basic system numbers

Access

Latency (secs)

Scan Throughput (GBs/sec)

Cost/TB

(Aug’ 24)

RAM

(D-RAM)

~100 nanosec

~100-150 GB/sec

~3125$/TB

High-end SSD

~10 microsec

~5-10 GB/sec

~100$/TB

~10 millisec

~100-200 MB /sec

~25$/TB

Machines M1 to M2 (Network)

~ 1 microsec

~ 5 GB/sec

N/A

Access Latency (secs) = Time to access Block’s start location

Scan Throughput (GB/sec) = Speed to Scan + Copy data to RAM

Note: Why are they different speeds?

  • Digital (e.g. SSDs and RAM) vs Analog (e.g. HDD seeks)
  • Distance from CPU (e.g. RAM is on same chip as CPU vs SSD)

11 of 19

IO Cost Model

Access Latency (secs)

Scan Throughput (GBs/sec)

Total Time to read one 64MB-Block

RAM

(D-RAM)

~100 nanosec

~100 GB/sec

High-end SSD

~10 microsec

~5 GB/sec

~10 millisec

~100 MB /sec

Machine 2 Machine (Network)

~ 1 microsec

~ 5 GB/sec

100 + 64MB/100GB/s

= 100 + 640000 nsec

10 + 12800 microsec

10 + 640 millisec

1 + 12800 microsec

Total Time to ReadData =

AccessLatency * M + DataSize/ScanThroughput

Where

  • DataSize = data size (in bytes)
  • M = Number of non-contiguous Blocks
  • AccessLatency = Time to access Block
  • ScanThroughput = Speed to copy/scan to RAM

12 of 19

Example

Data size

Q1: What’s row size (i.e., size of each record)?

SID: int32 ⇒ 4 bytes

Student: char[100] ⇒ 100 bytes

Address: char[200] ⇒ 200 bytes

Bio: char[720] 720 bytes ## Note: Picking so row size=1024

⇒ Each row is 1024 bytes

Q2: What’s table size

  1. with 1000 student rows? 1000 * 1024 bytes = 1 MB
  2. with 1M student rows? 1M * 1024 bytes = 1 GB
  3. With 1B student rows? 1B * 1024 bytes = 1 TB

Students

SID

Name

Address

Bio

40001

Mickey

43 Toontown

Mickey is a Sophomore in CS. He is...

40002

Daffy

147 Main St

Daffy is part of the Orchestra. He was...

50003

Donald

312 Escondido

Donald is a 1st year MS in EE. He was...

50004

Minnie

451 Gates

10008

Pluto

97 Packard

13 of 19

Example

Data speed

Access Latency for

1 Block (secs)

Scan Time for

1 TB (sec)

RAM

(D-RAM)

~100 nanosec

1TB/100 GBps

= ~10 sec

High-end SSD

~10 microsec

1TB/5GBps = ~200 sec

~10 millisec

1TB/100MBps =~10,000sec

Copy to RAM on Machine Z1 from RAM on Machine Z2

(Network) 1 microsec + (RAM on Z1) 100 nsec + (RAM on Z2) 100 nsec

~= 1.2 microsec

[10,000x faster vs reading from Z1’s HDD

Or 10x faster vs reading from Z1’s SSD]

(Network) 1 TB/5GBps

+ (RAM on Z1) 1 TB/100GBps

+ (RAM on Z2) 1 TB/100GBps

~= 220 secs

[50x faster vs reading from Z1’s HDD (~10,000 secs)

Or 10% slower vs reading from Z1’s SSD]

Q3: For 1 Billion student table of size 1TBs (stored in M=1)

[Recall Total Time to ReadData =

AccessLatency * M + DataSize/ScanThroughput ]

14 of 19

Example

Data speed

Access

Latency for 1 Block (secs)

(Same as Q3)

Scan Time for

1 TB (sec) with 100x more capacity?

(Same numbers as Q3, divided by 100)

RAM

(D-RAM)

~100 nanosec

10 sec/100

High-end SSD

~10 microsec

200 sec/100

HDD

~10 millisec

~10,000/100sec

Copy to RAM on Machines Z1..100 from RAM on Z101..200

~= 1.2 microsec

~= 220/100 secs

Q4: With 100 parallel machines? (100x RAM, 100x disks)

15 of 19

Example: Find a student, by scanning data

Find ‘Daffyfrom a DB of billion students (1 TB)

Design Choices

Data in RAM

(Scan sequentially & filter)

Data in HDD (no IO Blocks; rows in random spots)

(Access each row. Look for Daffy. Repeat for every row)

Data in 64MB-Blocks in HDD

(Access each block. Scan rows, look for Daffy. Repeat per 64MB-block)

(@100$/32GB)

3000$

(@25$/TB of disk)

25$

(@25$/TB of disk)

25$

(AccessLatency) 100 nsec + (Scan) 1000 GB / 100 GBps

~= 10 secs

Storage Cost

Time

(AccessLatency) 10 msec * 1 Billion rows + (Scan) 1 TB /100 MBps

= 10^7 secs + 10^4 secs

~= 115 days

Number of 64MB-blocks = 1 TB/64MB = 15625

(AccessLatency) 10 msecs *15625 blocks + (Scan) 1 TB /100 MB-sec

= 10.15^4 secs ~= 3 hrs

In 2 weeks, we’ll see how to do this a lot faster (in msecs) with good Indexes (e.g, hashing)

Design1

Design2

Design3

16 of 19

Example: Find a student, by scanning data

Find ‘Daffyfrom a DB of billion students (1 TB)

Design Choices

Data in SSD (no IO Blocks; rows in random spots)

(Access each row, look for Daffy; Repeat for every row)

Data in SSD 64MB-Blocks

(Access each block, scan rows, look for Daffy; Repeat per 64MB-block)

(@150$/TB of SSD)

150$

(@150$/TB of SSD)

150$

Storage Cost

Time

(AccessLatency) 10 microsec * 1 Billion rows + (Scan) 1 TB /5GBps

= 10^4 secs + 200 secs

~= 1200 secs

Number of 64MB-blocks = 1 TB/64MB = 15625

(AccessLatency) 10 microsecs *15625 blocks + (Scan) 1 TB /5 GBps

= 10.15^4 secs

~= 200 secs

In 2 weeks, we’ll see how to do this a lot faster (in msecs) with good Indexes (e.g, hashing)

Design4

Design5

17 of 19

Clouds of machines

(Example datacenter

from 2:50 min)

Typical dedicated (non-shared) machine assumptions (unless problem states otherwise):

    • 64 GB RAM
    • Block sizes: 64 MB-DB block (and RAM page size)

    • Example: AWS/GCP offer machine instances

(e.g, ec2.r5 offers 1-3GBps network bandwidth, 2CPU/16GB RAM to 96 CPU/768GB RAM for $-$$$ in Nov’21)

18 of 19

Cloud DBs

[Optional]

Data size

Size

Storage Cost

Compute Cost

Cost Range (per yr)

Small

~5 TB

~$1300

~$50,000

~$25k-$75k

Medium

~50 TB

~$14000

~$160,000

~$100k-$200k

Large

~200TB

~$55,000$

~$400,000

~$300k-$500k

Reference Snowflake [Jun’22]

(Google for full pricing calculators for BigQuery, RedShift, etc.)

Rough rule of thumb for data sizes

Ballpark Cloud DB costs

(*For popular Tables data.

Some apps use JSON (hierarchical data) in

nonSQL DBs – see later)

‘10-100x’ Lower costs in past 10 yrs.

Primary Driver? “db admin” team → Cloud management tools

Data Size

Hardware/ Storage

Algorithms

Languages + Libraries

‘Tiny’

(< 10 GBs)

Store in RAM

‘Usual’ CS Algorithms (sort, hash, ..)

Pandas (Python) + SQL

(Spreadsheets for < 100 MBs)

‘Small’

(10GB - 10TB)

Store in SSD

CS145 Algorithms

SQL on Cloud

‘Big’

(> 10 TBs)

Store in HDD

CS145 Algorithms

SQL on Cloud

19 of 19

In this Section

  • Systems – Size, Cost, and Speeds of GB/TBs
    • On RAM, HDDs/SSDs, clouds

  • IO Cost model
    • Simple IO model for Total Time (for RAM, SSD, HDD)
    • Good estimateReality is a bit more complex to fully model (buses, cache hit rates, etc.)

  • Examples analyzing system design choices