1 of 25

Composing and Decomposing Op-Based CRDTs with Semidirect Products

Matthew Weidner, Heather Miller, and Christopher Meiklejohn

Carnegie Mellon University

Pittsburgh, PA, USA

ICFP 2020

Virtual Jersey City, NJ, USA

8/24-26/2020

1

SPLASH 2022

Auckland, NZ

8 Dec 2022

2 of 25

2

Composing and Decomposing Op-Based CRDTs with Semidirect Products

3 of 25

An alternative: CRDTs (Conflict-free Replicated Data Types)

3

// Using Legion CRDT library

var legion = new Legion(...);

legion.joinGroup(...); // sets objectStore

var shopping = objectStore.getOrCreate(

"Shopping", legion.Set

);

shopping.add("milk");

Update local state immediately

A “PL” way of solving a systems problem.

Lazily broadcast the operation to other users/replicas

Composing and Decomposing Op-Based CRDTs with Semidirect Products

4 of 25

Pros

Cons

  • Need to design conflict-resolution alg for each data type (difficult & ad-hoc)
    • Restricted to a few common data types with restricted interfaces
  • Simple client-side programming model
  • Local-first
  • Supports end-to-end encrypted webapps
  • Data shared in small groups only (<100)
  • Not for data that needs locking (e.g. money)

4

Composing and Decomposing Op-Based CRDTs with Semidirect Products

5 of 25

In this work

Composition

Decomposition

We present a compositional design technique for (op-based) CRDTs, the semidirect product.

  • Counter → add mult ops
  • List → add reverse op
  • (Unique) Set → add for-each op
  • Set → two simpler (commutative) data types

5

(same state type)

(union of supported ops)

Composing and Decomposing Op-Based CRDTs with Semidirect Products

6 of 25

6

Make Ops Commute

  • +=3, +=5 instead of =3, =5

Reason about Causality

  • Alice: “add(‘milk’)”
  • Bob (later): “remove(‘milk’), but only after Alice’s op”
  • Result: no one has “milk”.

Update local state immediately

Broadcast the op to other users

Alice: value = 3

Bob: value = 5

Conflict resolution techniques

Composing and Decomposing Op-Based CRDTs with Semidirect Products

7 of 25

Causality in detail

The causal order on operations is a natural partial order < on operations:

CRDTs can query the causal order. Ex. Add-Wins Set CRDT: x is in the set if there is an add(x) operation that is not < any remove(x) operation.

  • So milk stays on the list.

7

add(‘milk’)

add(‘eggs’)

add(‘bread’)

add(‘milk’)

Alice

Bob

Charlie

remove(‘milk’)

<

<

<

concurrent

Composing and Decomposing Op-Based CRDTs with Semidirect Products

8 of 25

8

Reason about Causality

  • E.g. for a set: specify “x is in the set if there is an add(x) operation that is not < any remove(x) operation.”

Make Ops Commute

  • +=3, +=5 instead of =3, =5

Conflict resolution techniques

Our semidirect product provides a uniform, reusable way of doing this, so you don’t have to.

New CRDT design workflow: Design commutative data types for simple subsets of operations → Glue together with semidirect products.

Our experience: works for many existing CRDT types, plus novel ones.

Composing and Decomposing Op-Based CRDTs with Semidirect Products

9 of 25

An example of our semidirect product

(int register w/ add ops) + (int register w/ mult ops) → int register with both

  • add operations alone are easy: they already commute.
  • Same for mult operations alone.
  • But they conflict with each other:

9

Dave

Mary

Composing and Decomposing Op-Based CRDTs with Semidirect Products

10 of 25

Let us choose that in the face of concurrency, “adds go before mults”.

Observation: If we receive concurrent add and mult operations in the “wrong” order (mult before add), we get the right result if we act on the add with the mult.

Goal: Integer register CRDT supporting add and mult operations

  • Works by distributive law:

10

Dave

Mary

Dave

Mary

Composing and Decomposing Op-Based CRDTs with Semidirect Products

11 of 25

Goal: Integer register CRDT supporting add and mult operations

When you receive an add operation, transform it by all concurrent mult operations you’ve already received.

  • So becomes , where are all of the operations you’ve previously received that are concurrent to .
  • Works more generally subject to algebraic constraints on the input CRDTs & action
    • rich text format vs insert; shapes translate vs group-rotate

When you receive an add operation, transform it by all concurrent mult operations you’ve already received.

11

Dave

Mary

Composing and Decomposing Op-Based CRDTs with Semidirect Products

12 of 25

New since ICFP 2020

12

SPLASH 2022

Composing and Decomposing Op-Based CRDTs with Semidirect Products

13 of 25

What’s the meaning of this?

James Riely (DePaul), paraphrased: How does the semidirect product relate to the behavior of some sequential data type spec?

Me in 2020: 🤷

  • “adds before concurrent mults” can make cycles (no matching sequential trace); different papers handle these differently

Me in 2022: Maybe it’s not really about synthesizing new CRDTs; it’s a (premature?) optimization of one particular CRDT: the Unique Set with For-Each Ops

13

Jagadeesan & Riely, “Eventual Consistency for CRDTs”, ESOP 2018

Liang & Feng, “Abstraction for conflict-free replicated data types”, PLDI 2021

De Porre et. al, “ECROs: building global scale systems from sequential code”, OOPSLA 2021

SPLASH 2022

Composing and Decomposing Op-Based CRDTs with Semidirect Products

14 of 25

Unique Set with For-Each Ops

“Unique Set” CRDT: ≈ list CRDT w/o order (but similar remarks apply to list CRDTs)

  • Set of elements w/ reference identity
  • Ops: add; delete

Unique set of CRDTs: values are themselves CRDTs

  • Ops: add/new; delete/free; internal op on value CRDT

14

SPLASH 2022

Ex: Rich Text CRDT�= List of Rich Char CRDTs

Lorum ipsem dolor

Composing and Decomposing Op-Based CRDTs with Semidirect Products

15 of 25

Unique Set with For-Each Ops

Sequentially, we can also do: “for each element of a Unique Set, apply op O”.

  • E.g. apply “bold” op to each rich char in some range

15

SPLASH 2022

Collaboratively, we can do: “for each element of a Unique Set of CRDTs, including ones added concurrently, apply op O”.

  • “causal+concurrent for-each op”

Composing and Decomposing Op-Based CRDTs with Semidirect Products

16 of 25

Unique Set with For-Each Ops

This is an example of a semidirect product: arbitration order is add before forEach.�But now we have an obvious semantics (not just “whatever the semidirect product says”).

Empirical observation: many “useful” semidirect products are semantically equivalent to some Unique Set of CRDTs w/ causal+concurrent for-each ops.

16

SPLASH 2022

Literal semidirect products (e.g. add/mult) optimize state size, but maybe not worth conceptual overhead.

Composing and Decomposing Op-Based CRDTs with Semidirect Products

17 of 25

Implementation: Collabs

  • New TypeScript CRDT library for collaborative apps
    • Compare to: Automerge, Yjs
  • Supports custom CRDTs and composition techniques
    • Both necessary for semidirect products
    • No previous library has both
  • Preprint: http://arxiv.org/abs/2212.02618

17

SPLASH 2022

Huairui Qi

CMU (M.S. 2020)

Ignacio Maronna

CMU (M.S. 2020)

Maxime Kjaer

EPFL/CMU (M.S. 2021)

Ria Pradeep

CMU (B.S. 2021)

Benito Geordie

Rice

Composing and Decomposing Op-Based CRDTs with Semidirect Products

18 of 25

Demo apps: Rich Text Editor

18

SPLASH 2022

Composing and Decomposing Op-Based CRDTs with Semidirect Products

19 of 25

Demo app: Rich Text Editor

19

SPLASH 2022

  • Sequentially: new char inherits formatting from its left origin
  • Semidirect arbitration order: “format before insert”, so concurrent bolding is inherited

Benito Geordie

Rice

Composing and Decomposing Op-Based CRDTs with Semidirect Products

20 of 25

Demo app: Shapes POC

20

SPLASH 2022

  • Plausible for Powerpoint?
  • Semidirect arbitration order: “indiv translate before group rotate”
  • Technically a multiple semidirect product (translate, rotate, reflect)

Ria Pradeep

CMU (B.S. 2021)

Composing and Decomposing Op-Based CRDTs with Semidirect Products

21 of 25

Recap

Thanks for watching :)

  • CRDTs are a nice “PL” way of building systems that act on shared state
    • More info: https://crdt.tech/
  • Our semidirect product is a compositional design technique for CRDTs
    • New CRDT design workflow: Design commutative data types for subsets of ops → Glue together with semidirect products
  • Implemented in our TypeScript CRDT library, Collabs: https://collabs.readthedocs.io/
  • Maybe just need causal+concurrent for-each ops on a Unique Set of CRDTs?

21

Questions?

Image credits: zombo.com, openclipart.org

More links: mattweidner.com

Composing and Decomposing Op-Based CRDTs with Semidirect Products

22 of 25

22

Composing and Decomposing Op-Based CRDTs with Semidirect Products

23 of 25

Relation to Operational Transformation

Recall: OT is a competing approach to eventually consistent replicated data types. Instead of making concurrent operations commute, you define a transformation function tf_1: O x O -> O.

  • Received operations are transformed against previously-received concurrent operations using the (somewhat complicated) adOPTed algorithm (Ressel et al. ‘96).
  • Eventual consistency follows from 2 Transformation Properties, which are hard to verify (esp. TP2).

The semidirect product is the special case of OT when O = O1O2 and tf1(p, q) = p except when p O1, q O2. (O2 ops transform concurrent O1 ops, but that’s it.)

  • With these restrictions, adOPTed becomes the simple algorithm described for the add/mult integer register, and TP2 is typically trivial to verify.

It is interesting that we get common CRDT semantics from this special case of OT.

23

Composing and Decomposing Op-Based CRDTs with Semidirect Products

24 of 25

Generality

Q: When can a CRDT be decomposed as a semidirect product of simpler CRDTs?

Let C be a CRDT in the POLog model of pure op-based CRDTs. I.e., C is defined by a function f mapping the partially ordered log of operations to a visible state.

Suppose the ops of C can be partitioned as O1 U O2 so that for any POLog L, f(L) = f(L’) whenever L = L’ except that we have added some edges o1< o2 with o1 ∊ O1, o2 ∊ O2. Then C is the semidirect product of a CRDT with only O1 ops and a CRDT with only O2 ops.

This expresses the rule that O1 ops “go before” concurrent O2 ops.

24

Composing and Decomposing Op-Based CRDTs with Semidirect Products

25 of 25

Name Origin: Semidirect Product of Groups

In abstract algebra, a group is a set G with a binary operation ∘ : G x G → G satisfying multiplication-like properties.

  • ∘ is associative (g ∘ (h ∘ k) = (g ∘ h) ∘ k); ∃ 1 ∊ G s.t. g ∘ 1 = 1 ∘ g = g; ∀g ∊ G ∃ g-1 ∊ G s.t. g ∘ g-1 = 1.
  • I.e., a monoid with inverses.

The semidirect product combines two groups G, H into a group with set G x H. It contains copies of G and H which commute as

h ∘ g = (h ⊳ g) ∘ h

for a suitable action ⊳ of H on G.

The semidirect product of CRDTs uses an analogous rule to reorder concurrent ops from the components.

25

Composing and Decomposing Op-Based CRDTs with Semidirect Products