zrepl

One-Stop ZFS Replication Solution

OpenZFS Dev Summit 2018

Problem Space

So let’s look at the problem space - what is zfs replication about?

@a

zroot/var/db/pg

@c

@b

@d

backups/pg01/zroot/var/db/pg

@a

@b

@c

@d

pg01.example.com

backups.example.com

zfs list

zfs list

diff

zfs send

zfs recv

Snapshot Management

RPC & Diff Algorithm

Coordinate send & recv, Transport

Continuous Operation

#c

Essentially, it is the job of snapshotting a zfs filesystem on one host and then using zfs send & recv to push the changes between snapshots to another file system.

For those being confused, I will use the term filesystem to refer to the thing you create with ZFS create, so both filesystems and volumes, but not snapshots!

Let’s look at the example on the slide to get a bit more concrete:

Problem Space II

Multiple ZFS filesystems & zvols

Resumable send & recv

Other new ZFS send flags → interoperability

Untrusted receiver (third party storage service)

Operation: Logging, Monitoring, Alerting

Software Maintenance

Most importantly, often, you will want to replicate more than one filesystem.

You have your home directory on different zfs filesystems than your root fs and virtual machine drives in sepearte zvols,

You’ll want to back all of them up.

This adds a layer of complexity to the tool: it needs to

check which filesystems are present on each side

perform some kind of mapping operation, and

perform different actions depending on whether it is the first time the file system is replicated or whether incremental replication is possible.

Further, advanced ZFS features have come up in recent years, for example resumable send and receive.

However, they are not supported on all OpenZFS platforms, and our tool will need to check if both sides of the replication support incremental send & recv in order to use it.

Then there is one thing I’m particularly excited about: encrypted ZFS send & recv:

With that feature, we can send the encrypted blocks in the snapshot stream, and thus do not have to trust the receiving side.

Think about the interesing possiblities this opens: something like tarsnap, but for ZFS!

However, the receiver (our tarsnap knockoff in this case!) must also ensure that the sender does not compromise their system.

This is a big chunk of work, and likely unsuitable to be implemented in shell.

however, it’s what I’m most excited about at the time.

From an operational aspects, there are also lot’s of ways in which the current situation could improve:

both as a developer and a sysadmin, I want my replication tool to tell me what it is doing, through detailed log messages.

Also, I want a way to monitor it, and have some way for my monitoring system to alert me when things break.

The status quo in this area is a nightly mail from cron - I think there is tremendous room for improvement.

Finally, software maintenance. As pointed out before, I do not think shell is a terrible language from a software engineering perspective.

However, regardless of the implementation language, I want the replication tool to be written such that critical code paths can be tested, and that new ZFS features can be adopted without fear.

So...

zfs + ssh + awk + sed + grep + sh

are great for specific one-offs.

They will get in your way when writing a general-purpose tool that has a growing list of requirements and needs maintenance.

Given this fairly simple example, some in the audience might stand up and say that this task is so simple it could be easily automated in a shell script.

And I’m with you on this: The ZFS command line interface is one of the nicest ones I’ve ever used, and it’s great for use in scripts.

Our little example here could be done in 50 lines of shell.

However, it would not be a general purpose solution, but a one-of hack.

And when looking around, there are some one-off hacks that tried to grow into full-fledged zfs replication tools.

For example, the popular zxfer tool is 2352 lines of shell, in a single file.

How come it’s practically unmaintained?

It is certainly not feature-complete. Even bookmark support, to my recollection, is missing completely.

And while it is always easy to blame the language, I will do so in this particular case:

it’s hard to write good, safe shell code to begin with, and maintaining large shell code bases is much harder.

Your average shell script will not have unit tests and probably little modularity.

People are afraid to make changes, and thus development and maintenacne stalls.

But language is not the only aspect. Our little example oversimplified things a bit.

zrepl

zrepl is an integrated solution for
ZFS replication with focus on
ease-of-use

and long-term maintainability.

Roadmap: Design | Demo | QA

let’s see how zrepl addresses some of these aspects.

zrepl daemon

Runs on each side of the replication setup.

Daemons talk to each other via an RPC protocol.

zrepl daemon

pg01.example.com

zrepl daemon

backups.

example.com

zroot/db/pg

storage/backups/pg01/zroot/db/pg

@a

@b

...

@a

RPC

proto

In this section of the talk, we’ll start with a bird’s eye view and look at some of the details.

IDEA: demo first, then remainder of the talk? danger of fucking up the demo and then presenting an unconvincing argument afterwards, though.

Replication

Goal: push & pull replication without code duplication
Solution: replication code uses abstract sender & receiver

Replicator

Sender

Receiver

Server

Endpoint

Client

Endpoint

Pull

ZFS

ZFS

Replicator

Sender

Receiver

Client

Endpoint

Server

Endpoint

ZFS

ZFS

Push

Transports

Abstract transport interface:
authenticated, reliable channel

Plain TCP

TLS client auth

SSH

File descriptor passing via cmsg(3)

Forced command identifies client

zrepl daemon

ssh

sshd

zrepl

stdinserver

zrepl daemon

cmsg(3)

# /root/.ssh/authorized_keys
command=”zrepl stdinserver clientID” ...

RPC protocol

LEN

HDR

STRUCTURED

STREAM

string endpoint

int32 structLen

bool stream

string error

CH

CHUNK

CH

CHUNK

...

0

uint32 len

status st

Requirements: high throughput, low overhead, single conn

Home-grown, request-response based

Structured data & byte stream (known vs unknown len)

protobuf

zfs send stream

Legend

Endpoints

Sender endpoint: limits access to whitelisted filesystems

Receiver endpoint: limit access to client’s subtree

Benefit of Endpoints: Multitenancy

➞ ZFS native encryption

➞ Receiving untrusted replication streams

Replicator

Sender

Receiver

Server

Endpoint

Client

Endpoint

ZFS

ZFS

Snapshot Management

filesystems: {

“<”: true,

“zroot/tmp”: false

}

snapshotting:

snapshot_prefix: zrepl_

interval: 1h

pruning:

keep_sender:

- type: not_replicated

- type: last_n

count: 48

keep_receiver:

- type: grid

grid: 24x1h (keep=all) | 24x1h | 40x1d | 6x30d

On the sender, create snapshots in fixed intervals

Keep union of snapshots matched by keep rules

Tracked via a cursor bookmark

snapshotting:

snapshot_prefix: zrepl_

interval: 1h

pruning:

keep_sender:

- type: not_replicated

- type: last_n

count: 48

keep_receiver:

- type: grid

grid: 24x1h (keep=all) | 24x1h | 40x1d | 6x30d

Retention Grid (aka “grandfathering”)

Adjacent intervals, from now into past

Keep most `$keep` snapshots per specified interval

Default: keep=1

On the sender, create snapshots in fixed intervals

Keep union of snapshots matched by keep rules

Tracked via a cursor bookmark

New ZFS Features

Most can be used by default if both sides support them

Require support on both sender & receiver

resumable send & recv

-t, -s

compressed send & recv

-c

deduplicated send & recv

-D

large_blocks

-L

WRITE_EMBEDDED

-e

New ZFS Features II

encrypted (raw) send & recv

-w

redacted send & recv

--redact

Untrusted receiver, zrepl as a service
Requires auditing / hardening recv path

Exclude ~/.cache, ~/Downloads, etc. from backups
Complicates replication logic

Operation

Extensive documentation

Structured logging

zrepl status (progress view)

Prometheus endpoint

https://zrepl.github.io

Logging:

zrepl daemon logs a lot and uses the common log levels to discern debug messages from ciritical warnings, etc.

it is highly configurable, you can configure Multiple Outlet types (stdout, syslog, TCP + TLS)

and Log Formats, ranginng from a nice human readable format over log-F-M-T to json messages if you nead machine-readable logs

There is also a subcommand, zrepl control status, that connects to the local control socket exposed by zrepl daemon.

You can see all jobs and the individual subtasks, inspect the most recent log messages filtered by task and log level.

The communication between ‘zrepl control status’ and ‘zrepl daemon’ is plain JSON, which you can also get direct access to via a command line flag, and this is a good starting point for implementing something like a nagios check.

If that’s not enough, there is also an option to expose Prometheus endpoints via HTTP, which include the log message counts per log level and job, so you can be alerted using Prometheus Alert Manager if that fits your infrastructure better.

At last, for debugging purposes, there is also the ‘zrepl control pprof’ subcommand, which allows profiing and tracing of the zrepl daemon at runtime. So if you ever run into a problem, don’t just blindly restart zrepl but try getting some useful debug information out first!

Maintainability

Implementation in Go

Reasonable Performance

Somewhat typesafe

Tests for critical code

Great tooling

$ GOOS=freebsd go build -o zrepl-freebsd

$ GOOS=linux go build -o zrepl-linux

$ GOOS=solaris go build -o zrepl-solaris

$ GOOS=darwin go build -o zrepl-osx

Demo / Tutorial

Scenario: push backup

zroot/var/db

zroot/usr/home/bob

zroot/usr/home/alice

zroot/usr/home/paranoid

backups.example.com

storage/zrepl/push/prod1/var/db

storage/zrepl/push/prod1/usr/home/bob

storage/zrepl/push/prod1/usr/home/alice

X

prod1.example.com

push job

sink job

Roadmap

Roadmap

Merge refactoring of replication + push support + RPC

Resumable + encrypted send & recv

Other ZFS send features

Properties sync

?

OpenZFS wishlist

Receiving from untrusted sources (no mounting)

Ignore properties in snapshot stream (zfs recv -x all)

zfs(8) interface to decode receive_resume_token

zfs(8) feature discovery interface

Please try zrepl and give feedback!

Please try zrepl and give feedback!

Wait until refactoring is merged ¯\_(ツ)_/¯

Like it? I need help!

Docs: zrepl.github.io

Dev: github.com/zrepl/zrepl

Christian Schwarz

me@cschwarz.com

@problame

zrepl @ OpenZFS Dev Summit - Google Slides