1 of 78

Programmable storage

19 June 2018

Noah Watkins

Thesis defense

2 of 78

Three common application I/O stack architectures

2

n

App

Unified storage

File

Object

Block

3

n

App

App-specific Storage

App-specific

2

App

POSIX file storage

POSIX*

Middleware**

* POSIX-ish, **HDF5, PLFS, MPI-IO

1

n

3 of 78

Redundancy and specialization costs

3

Paxos is like the simplest thing ever…

4 of 78

System stabilization is expen$$ive

“It takes 10 years before a new storage system is trusted.”

-- Gary Grider (LANL)�The man, the myth, the legend

4

-- Brent Welch, MSST 2010

Footbag World Champion, Mixed Doubles, 1997

Eyeballs act as a proxy for reliability!

Share code-hardened sub-systems!

5 of 78

Outline and contributions

5

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

EuroSys ‘18

HotStorage ‘17

BDMC ‘13

PDSW ‘12

In-progress

3

Will this�ever work?

6 of 78

Outline and contributions

6

Compute resources

Graph processing

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

Development process

Declarative storage

In-vivo storage development

EuroSys ‘18

HotStorage ‘17

BDMC ‘13

PDSW ‘12

In-progress

Transactional data

Structured data

Durability

7 of 78

Outline and contributions

7

Compute resources

Graph processing

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

Development process

Declarative storage

In-vivo storage development

EuroSys ‘18

HotStorage ‘17

BDMC ‘13

PDSW ‘12

In-progress

1

New way to build storage interfaces!

Don’t re-implement an entire system
Recombining existing sub-systems

Transactional data

Structured data

Durability

8 of 78

Outline and contributions

8

Graph processing

Compute resources

Transactional data

Structured data

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

Development process

Declarative storage

In-vivo storage development

EuroSys ‘18

HotStorage ‘17

BDMC ‘13

PDSW ‘12

In-progress

1

Widely applicable, but challenges!

Design space and unpredictable costs
Problems → research opportunities

2

9 of 78

Outline and contributions

9

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

Development process

Declarative storage

In-vivo storage development

EuroSys ‘18

HotStorage ‘17

BDMC ‘13

PDSW ‘12

In-progress

Will this�ever work?

1

Tying it all together

Declaratively specifying interfaces
Apply DB opt. to composable subsystems

2

3

10 of 78

Programmability: avoid duplication and specialization

10

n

App

Unified storage

File

Object

Block

3

n

App

App-specific Storage

App-specific

2

App

POSIX file storage

POSIX*

Middleware**

* POSIX-ish, **HDF5, PLFS, MPI-IO

1

n

11 of 78

Programmable storage

11

n+k

App

Programmable storage system

(share common sub-systems)

File

Object

Block

n

App

POSIX file storage

App

App-specific Storage

App-specific

2

POSIX*

Middleware**

* POSIX-ish, **HDF5, PLFS, MPI-IO

1

n

App

A

B

C

Programmable storage

Storage system that facilitates the reuse and extension of existing storage abstractions provided by the underlying software stack, to enable to the creation of new services via composition.

12 of 78

Programmable storage compared to…

Software-defined storage (SDS)

Buzzword for provisioning
Original SDS work

[Thereska, ‘13], IOFlow
[Stefanovici, ‘16], sRoute

Recombining services

[Dorier, ‘17], CoSS/Mochi
[Gracia-Tinedo, ‘17], Crystal

Active storage

Use remote hardware computation resources
Original work with disks

[Riedel, ‘98] Active storage
[Keeton, ‘98], Intelligent disks

Object-based storage

[Du, ‘05], Intelligent OSDs
[Xie, ‘11], T10 integration

Recent work with SSDs

[Seshadri, ‘14], Willow
[Jo, ‘16], YourSQL
[Do, ‘13], SmartSSDs

12

Motivation, use cases, and lessons learned

Similar motivation, HPC specific, entire systems from scratch

13 of 78

Programmable storage exposes internal subsystems

13

file, object, block

Consensus

Persistence

Migration

Batching

Atomic operations

, data i/o, service metadata, file type, shared resources, durability

New interfaces through composition and customization

Storage system

14 of 78

Programmability survey in Ceph (data interfaces)

14

App-specific interfaces

Interface groupings

15 of 78

Programmability in Ceph (reusable data interfaces)

15

Method Examples

Locking and concurrency control
Logging
Metadata management
Remote compute

App-specific interfaces

Interface groupings

Developers willing to break layers and use non-standard APIs

16 of 78

Outline and contributions

16

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

17 of 78

Driving example: CORFU distributed shared-log

17

Balakrishnan et al., “CORFU: A Shared Log Design for Flash Clusters”, NSDI, `12

1	2	3	4

log striping

read

clients

Sequencer (P)

Sequencer (B)

1, 2, 3, 4, 5, ….

append

high-performance
total ordering

pos = seq++

send_msg(pos)

How can we implement CORFU with programmability?

18 of 78

Driving example: CORFU distributed shared-log

Implementation concerns

Partitioning, distribution
Metadata management
I/O interfaces

18

ZLog: CORFU on Ceph

Malacology

0	1	2	3	4	5	6	7	8	9	...

?

How to evaluate strategies?

Don’t want 9 implementations!
Metrics (throughput, latency)
Reduce the search space

Design space of 9 strategies

1 partitioning
3 I/O interfaces (entry data)
3 I/O interfaces (metadata)

Bluestore, XFS
RocksDB, LMDB
SSD, HDD, NVMe

Bytestream
Omap (K/V)
Xattr (K/V)

3 x {entry, metadata}

ℕ:𝕄

19 of 78

Design space: ZLog design on Ceph

19

metadata / entry	omap	bytestream	xattr
omap
bytestream
xattr

Are these combinations worth exploring?

20 of 78

Append throughput for 128 byte log entries

20

21 of 78

Design space: ZLog design on Ceph

21

metadata / entry	omap	bytestream	xattr
omap	IOPS	Stable performance	No
bytestream	IOPS	Stable performance
xattr	???	???

22 of 78

Append performance for a variety of entry sizes

22

Append size (bytes)

if then else

4K exception (config!)

if |Entry| < 4K - X → omap
else BS + pad to 4K alignment
if |Entry| > 4K + X → omap

23 of 78

Design space: ZLog design on Ceph

23

metadata / entry	omap	bytestream	xattr
omap	IOPS (< 8KB) or unaligned (see →)	Stable performance or IOPS (> 8KB) or aligned	No
bytestream	No	Stable performance of IOPS (> 8KB) or pad aligned < X
xattr	???	???

24 of 78

What about extended attributes (xattr)

Extended attributes are fast

Cached in-memory
Efficient [small] writes

24

metadata / entry	omap	bytestream
xattr	???	???

Created a design for small metadata

Compressed bitmaps!
Expected benefit over heavy-weight interfaces

The results are inconclusive
Ceph is gaslighting us!

And we are relative experts

What am I missing?

Are my other decisions incorrect?

25 of 78

Interest in ZLog from MegaCorp^®

25

26 of 78

Acceptance of programmable storage paradigm

Salesforce was using Apache BookKeeper
Distributed logging solution
Separate everything

Hardware storage cluster
Software ecosystem
External services (Zookeeper)
Maintenance and expertise

26

“... we are considering large scale deployments of Ceph ...“

“... Zlog seems more attractive as its on the same technology stack.”

27 of 78

Many use cases; tail latency is universally important

27

28 of 78

Tail latency in Ceph isn’t good, but interfaces matter!

28

Interfaces affect tail latency

29 of 78

Design space: ZLog design on Ceph

29

metadata / entry	omap	bytestream	xattr
omap	IOPS (< 8KB) or unaligned (see →)	Stable performance or IOPS (> 8KB) or aligned Tail latency	No
bytestream	No	Stable performance of IOPS (> 8KB) or pad aligned < X Tail latency
xattr	No	No

30 of 78

Navigating the design space is an obstacle

The CORFU storage abstraction is conceptually simple

Far more complex abstractions!

Searching and pruning the design space are essential
Need to build actual prototypes

Implementation issues always emerge
What about reads, scans, etc… ?

How do we know we aren’t missing something?

Deep understanding of Ceph, and still have doubts?

30

31 of 78

Outline and contributions

31

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

32 of 78

Outline and contributions

32

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

33 of 78

How to grow a database: scale-up approach

33

Database Node

CPU

RAM

Database Storage

Network /

Bus

Q

https://aws.amazon.com/ec2/instance-types/

This isn’t a talk about database architectures
Many, many, approaches to scalability
See (MPP, Hybrid-MPP, etc…)

34 of 78

Skyhook: exploit storage resources

34

Database Node

CPU

RAM

Database storage

Network /

Bus

Database storage�

Network

Q

(Q)

Skyhook project

Elastic database system
Lead: Jeff LeFevre
Active CROSS incubator

Single-node architecture

Database Node

CPU

RAM

Q

Skyhook�architecture

Programmable storage

35 of 78

Skyhook: aligns data with storage interfaces

35

DB-specific Data Interface

Ceph OSD

RAM

CPU

Storage+Index

Q

Ceph OSD

RAM

CPU

Storage+Index

Ceph OSD

RAM

CPU

Storage+Index

Ceph OSD

RAM

CPU

Storage+Index

Ceph OSD

RAM

CPU

Storage+Index

Ceph OSD

RAM

CPU

Storage+Index

Ceph OSD

RAM

CPU

Storage+Index

Ceph Cluster

C1	C2	C3

Table

Shards

partitioning

{ object.i }

Programmability used in a completely different way than to build a log abstraction

Database Node

RAM

CPU

Foreign Data Wrappers

Q

Database node

* Indexing

* Projection

* Filtering

* Aggregation

App-specific�interface

36 of 78

Skyhook experiments with programmable storage

Real-world dataset

TPC lineitem table
1 billion rows
140 GB

Storage in Ceph objects

Table divided into ~10,000 14 MB objects

Optimize for workload (e.g. 4MB)

Each object contains a dedicated index

Index stored in omap (RocksDB)

Storage hardware (thanks CloudLab!)

Modern 20 core Intel
128 GB DRAM, 500 GB SSD
10 GB/s Ethernet
1 -- 16 Ceph nodes

36

Database Node

CPU

RAM

Programmable storage

Network

(Database-specific data interface)

Q

37 of 78

Benchmark queries evaluated

Qa: Range query with 10% selectivity:

SELECT * FROM lineitem WHERE extendedprice > 71000.0

Qb: Point query (unique row) issued with and without index:

SELECT extendedprice

FROM lineitem� WHERE orderkey=5 AND linenumber=3

Qc: Regex query with 10% selectivity (CPU intensive):

SELECT * FROM lineitem WHERE comment iLIKE '%uriously%'

37

+ data loading

38 of 78

Range query performance (10% selectivity)

38

Improved I/O performance

Local I/O bandwidth
Local CPU resources
Reduced network traffic
CPU parallelism

Database Node

CPU

RAM

Database Storage

Network

Lower = 💚 💚 💚

Client-side processing

Server-side processing

39 of 78

Bulk-load and index generation performance

39

Internal data structures don’t handle bulk inserts efficiently

Per-object overheads accumulate

Table

Shards

Storage+Index

{ object.i }

Lower = 💚 💚 💚

40 of 78

Point query performance (find unique row)

40

Local I/O bandwidth
Local CPU resources
Reduced network traffic
CPU parallelism

10,000 index lookups!
1 billion rows

Database Node

CPU

RAM

Database Storage

Network

Lower = 💚 💚 💚

Client-side processing

Server-side processing

Server-side processing with index acceleration

41 of 78

Outline and contributions

41

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

42 of 78

Outline and contributions

42

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

43 of 78

What is durability?

Overloaded terminology

You get out what you put into it
Preserve historically important info
Always access the very best data now

We assume storage systems expose durable interfaces

Unless explicitly otherwise

Broad class of subsystems

Redundancy (replication, EC)
Recovery / failover / availability
Consistency semantics
Properties and behaviors of media

43

Long-term storage

500 years

44 of 78

Driving example: CORFU distributed shared-log

44

Balakrishnan et al., “CORFU: A Shared Log Design for Flash Clusters”, NSDI, `12

log striping

Sequencer (P)

Sequencer (B)

1, 2, 3, 4, 5, ….

read

clients

append

Programmed data interface

State: integer
Read, ReadNext

Ceph data interfaces

They are all durable / persistent!
Test it out: map to DRAM

45 of 78

Persistent media is only part of the bottleneck

45

SSD

DRAM

46 of 78

Software is a bottleneck

46

Queuing and scheduling

network

Persistent storage media

Transactional context

client req

This I/O path is taken by all requests… regardless of need!

error conditions

replication

tiering

error conditions

{ client operations }

concurrency control

clone

indexing

error

conditions

CoW

?

Goal: optimize for broad spectrum of request needs

State: intertwined with correctness-sensitive handling

47 of 78

The CORFU sequencer is a high availability service

47

Balakrishnan et al., “CORFU: A Shared Log Design for Flash Clusters”, NSDI, `12

log striping

Sequencer (P)

Sequencer (B)

1, 2, 3, 4, 5, ….

read

clients

append

Ceph already provides availability... recovery!

But the state is volatile…
What is recovered?

Function CORFU Sequencer Recover

1: FOR EACH STORAGE DEVICE

2: MAX = SEAL(DEVICE)

3: RETURN MAX

EndFunction

48 of 78

Data availability applied to sequencer interface

48

Seq @ OSD.0

Seq @ OSD.1

OSD.0 Failure

Seq @ OSD.2

OSD.1 Failure

Configurable

timeouts

Configurable replicas

49 of 78

Outline and contributions

49

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

50 of 78

Outline and contributions

50

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

51 of 78

Metadata management

Metadata is everywhere in storage

It gives the data we store meaning
Plays a supportive role in everything

Naming resources
Feature implementation

51

App

Storage system

File

Object

mount/

file1
file2

POSIX file resource

Config DB

Many types of metadata

POSIX file abstraction
Cluster-level metadata
Sub-systems (caching, scrubbing, …)

52 of 78

Metadata management

Metadata is everywhere in storage

It gives the data we store meaning
Plays a supportive role in everything

Naming resources
Feature implementation

Many types of metadata

POSIX file abstraction
Cluster-level metadata
Sub-systems (caching, scrubbing, …)

New interfaces with programmability

A programmable interface needs a name
Instances of interfaces track metadata
Need similar sets of services

52

App

Storage system

File

Object

A

B

C

mount/

file1
file2

POSIX file resource

Config DB

Need: Naming, metadata storage. etc...

53 of 78

POSIX namespace management of all interfaces

53

/users/

jane/
jerry/
john/

mount/

/science/

fake/
data/
graphs/

Metadata cluster

concurrency control

capabilities

security

cache management

POSIX/File

inode

/log-instances/

log1
log2
log3

Instances of ZLog used by applications

/log-instances/

log4/

stream0
stream1

54 of 78

Programmable metadata management (file types)

54

/users/

jane/
jerry/
john/

mount/

/science/

fake/
data/
graphs/

/log-instances/

log1
log2
log3

/log-instances/

log4/

stream0
stream1

inode

Metadata cluster

concurrency control

capabilities

security

cache management

POSIX/File

ZLog Basic

ZLog Streaming

inode

[Karpovich, ‘94]

55 of 78

Programmable metadata management (file types)

55

mount/

/log-instances/

log1
log2
log3

inode

Metadata cluster

concurrency control

capabilities

security

cache management

ZLog Basic

inode

ZLog Interface

FS client

ZLog metadata

Cost model
Sequencer IP

Naming / discovery

Sequencer (P)

Sequencer (B)

[Karpovich, ‘94]

56 of 78

Programmable metadata management (coherency)

56

mount/

/log-instances/

log1
log2
log3

inode

Metadata cluster

concurrency control

capabilities

security

cache management

ZLog Basic

inode

ZLog Interface

FS client

ZLog metadata

Cost model

Naming / discovery

ZLog Interface

FS client

ZLog Interface

FS client

Cache invalidation�protocol

Can enforce exclusive�access to metadata

Sequencer state (integer)

pos = seq++

57 of 78

The capability-based sequencer is round-robin

57

ZLog Interface

FS client

ZLog Interface

FS client

Compared to centralized architecture…

can benefit bursty workloads

best effort

delay

quota

58 of 78

Findings trade-offs with capability-based sequencer

58

better

Throughput per Policy

Latency per Policy

More opportunities to receive the shared resource capability

59 of 78

Outline and contributions

59

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

60 of 78

Outline and contributions

60

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

61 of 78

Programmability looks like a one-time cost

Navigating the design space

Difficult and time-consuming

61

Tricks of the trade

Trimming down the search space

But if you put in the hard work…

…you’ll eventually arrive at a design that works

WRONG!

is not

62 of 78

ZLog performance toss up on 2014 version of Ceph

Four ZLog implementations
Ceph release (2014)
Graph takeaways

Clear performance losers
Similar top performers (few %)

Our claim…

Select simpler implementation
Added complexity for no benefit

What is complexity?

Lines of code
Conceptual

62

Performance Comparison of 4 Designs

Appends / Sec

Ceph 2014

63 of 78

Clear ZLog implementation choice in 2016

Same implementations
Same hardware / benchmark
Newer version of Ceph
Clear performance winner

Takeaway:

A reasonable choice in 2014 is bad choice in 2016
Worst part: happy with one not knowing about the other

63

Performance Comparison of 4 Designs

Appends / Sec

Ceph 2014

Ceph 2016

Appends / Sec

64 of 78

That is the state of programmable storage

Large design space

High cost of searching this space

Costs are difficult to predict

Simple upgrade and change the calculus!

Much harder than what we have presented

> 500 tunables/settings in Ceph

Not counting dependencies

Runs on a wide-variety of hardware

No hope of migrating to a new system

There are no standards!

64

65 of 78

Ceph programmability 2010 to 2016

65

66 of 78

Ceph programmability usage since 2016

66

61%

67 of 78

Programmability is critical to the Ceph ecosystem

Ceph is currently undergoing a major redesign

Address performance issues
Next-generation hardware (e.g. persistent memories).

When asked about the fate of object classes given the redesign opportunity:

67

“cls [object classes] isn't going away... it's proven pretty important for all of RGW, RBD, and CephFS... It has proven extremely useful and it's also a clean way to incorporate logic during updates without slowing down the I/O pipeline (mostly!).”

-- Sage Weil, lead architect of Ceph

68 of 78

Popping up in different places

OpenStack Swift Storlets

Swift is a cloud-scale object storage system
Storlets allow developers to push code into system

Cloud functions / lambda

Similar conceptual ideas to programmability
Priming new generation of developers

Coming to a storage system near you
What can we do to deal with the issues?

68

Swift

Storlets

Microsoft

Google

Amazon

69 of 78

Outline and contributions

69

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

1

3

Will this

ever work?

Declarative storage

In-vivo storage development

70 of 78

Outline and contributions

70

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

1

3

Will this

ever work?

Declarative storage

In-vivo storage development

71 of 78

Declarative storage

Design space was a late discovery

And operational pitfalls (e.g. simple upgrades!)
Immediately understood it to be a major concern

71

Automate parts of this process

Searching the design space
Generating implementations

Query optimization & plan generation

Cost model

Express interfaces declaratively

Eliminate need for storage system expertise
High-level abstractions across services / systems

Prototyping with the language

Formal underpinning, demonstrated across domains
Can express all of the CORFU semantics

72 of 78

In-vivo storage system development

Storage systems are high-availability; they’re “always-on”
Once systems are made to be more adaptable…

...they’ll need to adapt while running

How will developers interact with the system?

Interfaces are developed like software
And inextricably linked to data

How will the system evolve?

Conflicting goals and user needs
Maintain common requirements like SLA

Observation

Well-defined points of change
Hardware and software changes

Offline optimization techniques

72

73 of 78

Outline and contributions

73

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

74 of 78

Outline and contributions

74

Transactional data

Compute resources

Structured data

Graph processing

Durability

Data interfaces

Metadata management

Naming resources

Shared resources

Cluster-level metadata

Programmable storage design paradigm

2

Development process

Declarative storage

In-vivo storage development

1

3

Will this

ever work?

75 of 78

Recap and conclusions

Storage systems from scratch

Expensive and unreliable
Reproduce common sub-systems

Programmability surfaces sub-systems for reuse
Applicable across a wide range of application needs

Transactional, computation resources
Structured and graph data models

Challenges with programmability → new research goals

Large design space
Portability challenges

Introducing the next stage: declarative storage

Applying techniques from database world to programmable components
This is where a large portion of real-value comes from
This is a large research area, and we’ve come very close to connecting it all :)

75

76 of 78

Future work

76

Storage system architecture

Internal organizations that increase programmability (e.g. request handling)
Expanding the search of system components

Queueing and batching

Developer assistance

Automatically generating cost models

Automated performance sweeps
Identify equivalence classes

Survey patterns across large set of examples found in real-world use cases

Declarative storage

End-to-end demonstration using CORFU specification and cost models
Non-volatile memories

“The CPU is the bottleneck”
Generating “less” code

Formalizing the interaction between components

Allow modeling and verification tools to be applied

77 of 78

Publications

77

HotStorage ’17	DeclStore: Layering is for the Faint of Heart N. Watkins, M. Sevilla, I. Jimenez, K. Dahlgren, P. Alvaro, S. Finkelstein, and C. Maltzahn
EuroSys ’17	Malacology: A Programmable Storage System M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, and C. Maltzahn
HotStorage ’16	ZEA, A Data Management Approach for SMR A. Manzanares, N. Watkins, C. Guyot, D. LeMoal, C., and Z. Bandic
PDSW ’15	Automatic and Transparent I/O Optimization With Storage Integrated Application Runtime Support N. Watkins, Z. Jia, G. Shipman, C. Maltzahn, A. Aiken, and P. McCormick
SC ’15	Mantle: A Programmable Metadata Load Balancer for the Ceph File System M. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. Brandt, S. Weil, G. Farnum, and S. Fineberg
BDMC ’13	In-Vivo Storage System Development N. Watkins, C. Maltzahn, S. Brandt, I. Pye, and A. Manzanares
PDSW ’12	DataMods: Programmable File System Service N. Watkins, C. Maltzahn, S. Brandt, A. Manzanares
SC ’11	SciHadoop: Array-based Query Processing in Hadoop J. Buck, N. Watkins, J. Lefevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, S. Brandt
DADC ’09	Abstract Storage: Moving File Format-specific Abstractions Into Petabyte-scale Storage Systems J. Buck, N. Watkins, C. Maltzahn, S. Brandt

78 of 78

Thank you everyone

Committee: Carlos Maltzahn, Scott Brandt, Peter Alvaro, and other amazing collaborators: Neoklis Polyzotis, Jeff LeFevre, Shel Finkelstein, Ike Nassi, Kleoni Ioannidou

Michael Sevilla, Ivo Jimenez, Joe Buck, Dimitris Skourtis, Adam Crume

Pat McCormick, Galen Shipman, John Bent, Gary Grider, Adam Manzanares, Kleoni Ioannidou, Jay Lofstead, Sage Weil, Anna Povzner, Greg Farnum

78