1 of 90

Multi-Version Concurrent Data Structures�

PANAGIOTA FATOUROU

University of Crete, Department of Computer Science

Foundation for Research and Technology - Hellas

School on the Practice and Theory of Distributed Computing, November 2023

2 of 90

Multiversion Objects

    • A multiversion object maintains its previous versions, �so threads can have access to the history of the object (i.e., to its previous values).

SPTDC 2023 PANAGIOTA FATOUROU

5

10

Current version

Write(10)

5

Current version

Write(8)

8

10

Current version

5

2

3 of 90

Multiversioning

SPTDC 2023 PANAGIOTA FATOUROU

  • Multiversioning is widely used:

Database systems

  • Software Transactional Memory �[Fernandes et al. PPoPP’11] [Lu et al. DISC’13]

  • Concurrent data structures [Fatourou et al. SPAA’19] [Wei et al. PPoPP’21] [Kobus et al. PPoPP’22] [Sheffi et al. OPODIS’22]

3

4 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Many applications require querying large portions or multiple parts of the data structure.

Big-data applications use shared in-memory tree-based data indices

  • Fast data retrieval
  • Useful data analytics

Why Multiversioning?

4

5 of 90

Why multivesion Concurrent Data Structures?

SPTDC 2023 PANAGIOTA FATOUROU

Concurrent Data Structure built out of CAS objects

Take a snapshot of the data structure

Answer multi-point Queries using snapshot

The vCAS technique

  • Yuanhao Wei, Naama Ben-David, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert, and Yihan Sun: Constant-Time Snapshots with Applications to Concurrent Data Structures, PPoPP 2021.

Snapshot: Saves a read-only version of the state of the data structure at a single point in time. [An atomic view of the state of the data structure.]

5

6 of 90

Background Knowledge

SPTDC 2023 PANAGIOTA FATOUROU

6

7 of 90

Model

SPTDC 2023 PANAGIOTA FATOUROU

Thread 1

Thread 2

Thread n

shared variables

ATOMIC boolean Compare&Swap(� Variable V, Value vold, Value vnew) { � � if (V == vold) { V = vnew; return TRUE; } � return FALSE; �}

  • The system is asynchronous.
  • Threads communicate by accessing shared variables.
  • In addition to Read and Write, a thread may execute an atomic CAS instruction on a shared variable.
  • Threads may fail by crashing.

7

8 of 90

Correctness [Herlihy & Wing]

Linearizability

  • In every execution α, each operation should have the same response as if it has executed serially (or atomically) at some point in its execution interval. This point is called linearization point of the operation.

SPTDC 2023 PANAGIOTA FATOUROU

8

9 of 90

Linearizability: Queue supporting ReadAll()

SPTDC 2023 PANAGIOTA FATOUROU

ReadAll()

*

*

time

?*

?*

?*

Deq()

Enq(3)

res= 1

ok

ok() -> {1,2}

Enq(1)

Enq(2)

ok

*

*

{1}

Queue:

{1, 2}

{2}

{2, 3}

*

9

10 of 90

SPTDC 2023 PANAGIOTA FATOUROU

ReadAll()

*

*

time

?*

?*

?*

X

Example of non-linearizable execution

Deq()

Enq(3)

res= 1

ok

{1,2,3}

Enq(1)

Enq(2)

ok

ok

*

*

Linearizability: Queue supporting ReadAll()

X

X

1

Head

2

Tail

res = {1,2}

{1}

Queue:

{1, 2}

{2}

{2, 3}

Set ReadAll() { // sequential alg

Node *q = Head;

Set res;

while (q != NULL) {

res = res ∪ {q->data};

q = q-> next; }

return res;

}

10

11 of 90

SPTDC 2023 PANAGIOTA FATOUROU

ReadAll()

*

*

time

X

Example of non-linearizable execution

Deq()

Enq(4)

res= 1

ok

{1,2,3}

Enq(1)

Enq(2)

ok

ok

*

*

Linearizability: Queue supporting ReadAll()

Head

2

3

Tail

Set ReadAll() {

Node *q = Head;

while (q != NULL) {

res = res ∪ {q->data};

q = q-> next; }

return res;

}

X

res = {1,2,3}

X

X

1

X

11

12 of 90

Progress

Non-blocking Algorithms

Wait-Freedom

  • Every thread finishes the execution of its operation within a finite number of steps.

Lock-Freedom

  • Some thread finishes the execution of its operation within a finite number of steps.

SPTDC 2023 PANAGIOTA FATOUROU

12

13 of 90

An Example of a Concurrent �Queue Implementation

SPTDC 2023 PANAGIOTA FATOUROU

13

14 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

struct node {

T value ; // unmutable

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *; // initially, both point

to a dummy node

14

15 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

struct node {

T value ; // unmutable

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *; // initially, both point

to a dummy node

Thread 1

Enqueue(1)

Read

Read

Michael & Scott Queue as an Example

15

16 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

1

struct node {

T value ; // unmutable

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *; // initially, both point

to a dummy node

Thread 1

Enqueue(1)

CAS

16

17 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

1

struct node {

T value ; // unmutable

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *; // initially, both point

to a dummy node

Thread 1

Enqueue(1)

17

18 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

1

struct node {

T value ; // unmutable

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *; // initially, both point

to a dummy node

Thread 1

Enqueue(1)

CAS

Michael & Scott Queue as an Example

18

19 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

1

struct node {

T value ; // unmutable

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *; // initially, both point

to a dummy node

Michael & Scott Queue as an Example

19

20 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

Head

1

2

Thread 2�Enqueue(2)

Thread 1

Enqueue(1)

CAS

CAS X

Michael & Scott Queue as an Example

20

21 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

Thread 1

CAS

21

22 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

22

23 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

CAS

23

24 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 1

Tail

CAS

24

25 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Tail

25

26 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

Thread 1 �is slow

Read

Read

26

27 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

Thread 1 �is slow

CAS

27

28 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

Thread 1�is slow

CAS

28

29 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Thread 2

Tail

Thread 1�is slow

CAS

29

30 of 90

Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Head

1

2

Tail

30

31 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Head

Thread 1

Deq()

Read

Michael & Scott Queue as an Example

Dummy

1

2

Tail

stores 1 as its return value

31

Read

Read

32 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Head

Thread 1

Deq()

Michael & Scott Queue as an Example

Dummy

1

2

Tail

CAS

32

33 of 90

SPTDC 2023 PANAGIOTA FATOUROU

Head

Michael & Scott Queue as an Example

Dummy

Dummy

1

2

Tail

33

34 of 90

vCAS Technique

SPTDC 2023 PANAGIOTA FATOUROU

Lock-free Queue [Michael & Scott’96]

Enqueue, Dequeue

Lock-free Snapshottable Queue

Enqueue �Dequeue

Snapshot

Range Query

i-th element

All elements

Simple, General, Efficient!

Preserves parallelism and time bounds

O(1) time, a single CAS

Wait-free,

Linearizable

Works with many lock-free data structures, including:

  • BST [Ellen,Fatourou, Ruppert, Breugel’10]
  • Linked List [Harris’01]
  • Chromatic Tree [BrownEllenRuppert’14]

Lock-free Concurrent Data Structure

Take a snapshot of the Data Structure

Answer multi-point Queries using snapshot

34

35 of 90

Overview of the VCAS Approach

SPTDC 2023 PANAGIOTA FATOUROU

CAS Object

Versioned CAS (vCAS) Object

Camera Object

Supports:

  • vRead
  • vCAS
  • readVersion

Supports:

  • Read
  • CAS

Supports:

TakeSnapshot

Time Complexity:

  • vRead(X)
  • vCAS(X, old, new)
  • takeSnapshot()
  • readVersion(X, S)

O(1) time, small constant

wait-free

35

Makes it possible for a thread to later read only the memory locations it needs from shared memory, knowning that all such reads will be atomic.

36 of 90

Supporting Multi-Point Queries

SPTDC 2023 PANAGIOTA FATOUROU

ts

Reads 0

0

1

2

Reads 1

Reads 1

Query thread

2

  • Each query calls TakeSnapshot to get a timestamp.
  • More than one queries may have the same timestamp.
  • Each query attempts to atomically increment ts using CAS.
  • Each version of a vCAS object has a timestamp, which has been read from ts.

36

37 of 90

Versioned CAS Implementation

SPTDC 2023 PANAGIOTA FATOUROU

VCAS Object

next

  • VCAS objects are represented internally using version lists. The fields of a Vnode (i.e., a node of a version list) are:
    • val
    • ts
    • vnext

37

nextv

8

val

nextv

4

val

nextv

0

val

Current value of next: Pointer to next node of queue

VNode

VNode

VNode

List ordered by timestamps, most recent first.

38 of 90

Versioned CAS Implementation

SPTDC 2023 PANAGIOTA FATOUROU

VCAS Object

next

38

nextv

8

val

nextv

4

val

nextv

0

val

previous values of next field of node

39 of 90

Versioned CAS Implementation

SPTDC 2023 PANAGIOTA FATOUROU

ts

vCAS(X, old, new)

  • Link in a new node with timestamp TBD
  • Update its timestamp

vRead(X)

  • Help update timestamp of most recent version
  • Return its value

takeSnapshot()

  • Attempt to increment ts using CAS
  • Return its previous value

readVersion(X, t)

  • Help update timestamp
  • Find newest version with timestamp t

VCAS Object

39

8

next

nextv

8

val

nextv

4

val

nextv

0

val

40 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

nextv

0

val

Thread 1�Enqueue(1)

nextv

0

val

vRead

vRead

struct node {

T value ;

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *;

struct node {

T value ;

vCAS Object next : struct node *;

}

vCAS objects Head, Tail: struct node *;

Head

nextv

0

val

0

ts

40

vRead(X)

  • Help update timestamp of most recent version of X
  • Return current value of X

41 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

nextv

0

val

1

Thread 1�Enqueue(1)

nextv

0

val

nextv

0

val

struct node {

T value ;

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *;

struct node {

T value ;

vCAS Object next : struct node *;

}

vCAS objects Head, Tail: struct node *;

Head

nextv

0

val

0

ts

41

42 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

nextv

0

val

1

Thread 1�Enqueue(1)

vCAS

nextv

0

val

nextv

0

val

nextv

-1

val

struct node {

T value ;

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *;

struct node {

T value ;

vCAS Object next : struct node *;

}

vCAS objects Head, Tail: struct node *;

Head

nextv

0

val

0

ts

42

vCAS(X, old, new)

  • Malloc() a new vNode with timestamp TBD (-1)
  • Link it in the version list of the vCAS object
  • Update its timestamp

43 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

struct node {

T value ;

CAS Object next : struct node *;

}

CAS objects Head, Tail: struct node *;

struct node {

T value ;

vCAS Object next : struct node *;

}

vCAS objects Head, Tail: struct node *;

nextv

0

val

1

Thread 1�Enqueue(1)

nextv

0

val

nextv

0

val

nextv

0

val

Head

nextv

0

val

0

ts

43

vCAS(X, old, new)

  • Malloc() and link in a new vNode with timestamp TBD (-1)
  • Make it the first node in the vlist of vCAS object
  • Update its timestamp

44 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

nextv

0

val

1

Thread 1

nextv

0

val

nextv

0

val

nextv

0

val

vCAS

nextv

0

val

Head

nextv

0

val

ts

0

44

45 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

nextv

val

1

Thread 1

nextv

0

val

nextv

0

val

nextv

0

val

Head

nextv

ts

val

ts

0

45

nextv

0

val

nextv

0

val

46 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

Tail

nextv

0

val

1

nextv

0

val

nextv

0

val

nextv

0

val

nextv

0

val

2

Thread 2

Enqueue(2)

nextv

ts

val

3

Thread 3

Enqueue(3)

nextv

ts

val

Thread 4

Head

nextv

ts

val

deq()

ts

0

46

47 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Version list of 1->next

Version list of Dummy-> next

Dummy

Tail

nextv

ts

val

0

0

1

nextv

ts

val

0

0

2

0

0

3

0

0

nextv

ts

val

nextv

ts

val

Head

nextv

ts

val

0

0

ts

0

47

nextv

ts

0

0

0

0

val

Version list of Tail

Version list �of Head

Version list of 2->next

Version list of 3->next

48 of 90

Multi-Point Queries:�Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

3

Thread 3

Enqueue(3)

nextv

ts

val

Thread 4

ts

ReadAll()

Dummy

Tail

nextv

ts

val

0

0

1

nextv

ts

val

0

0

2

0

nextv

ts

val

Head

nextv

ts

val

0

deq()

0

48

nextv

ts

val

0

0

0

Takesnapshot → 0

1

49 of 90

Multi-Point Queries: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

Dummy

nextv

ts

val

0

0

Dummy

nextv

ts

val

0

0

2

0

1

3

1

nextv

ts

val

nextv

ts

val

Head

nextv

ts

val

0

1

ts

1

49

Tail

0

0

0

1

50 of 90

Multi-Point Queries: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

ReadAll()

  • Executes Sequential Code, but…

Dummy

Tail

nextv

ts

val

0

0

1

nextv

ts

val

0

0

2

0

1

3

1

nextv

ts

val

nextv

ts

val

Head

nextv

ts

val

0

1

ts

1

50

0

0

0

1

Set ReadAll() {

Node *q = Head;

while (q != NULL) {

res = res ∪ {q->data};

q = q-> next; }

return res;

}

51 of 90

SPTDC 2023 PANAGIOTA FATOUROU

ReadAll()

*

*

time

X

Example of non-linearizable execution

Deq()

Enq(4)

res= 1

ok

{1,2,3}

Enq(1)

Enq(2)

ok

ok

*

*

Linearizability: Queue supporting ReadAll()

Head

2

3

Tail

Set ReadAll() {

Node *q = Head;

while (q != NULL) {

res = res ∪ {q->data};

q = q-> next; }

return res;

}

X

res = {1,2,3}

X

X

1

X

51

52 of 90

Multi-Point Queries: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

ReadAll()

  • Executes Sequential Code for ReadAll() but…
  • Uses ReadVersion(0) to read the values of vCAS objects as it goes

It returns {1,2}

Dummy

Tail

nextv

0

val

nextv

ts

val

nextv

0

val

nextv

0

val

nextv

1

val

0

0

1

nextv

ts

val

0

0

2

0

1

3

1

nextv

ts

val

nextv

ts

val

Head

nextv

ts

val

0

1

ts

1

52

readVersion(X, t)

  • Help update timestamp
  • Find newest version with time t

53 of 90

Overview of the VCAS Approach: �Michael & Scott Queue as an Example

SPTDC 2023 PANAGIOTA FATOUROU

void enq(T value ) {

NODE *next , *last ;

1. NODE *p = newcell(NODE) ;

// p->value = value ; p->next = NULL;

4. while (TRUE) {

5. last = Tail ;

  1. next = last->next ;

7. if (last != Tail) continue;

  1. if (next != NULL) {
  2. CAS( , last, next);
  3. continue;
  4. }

12. if (CAS( , NULL , p)) break ;

13. }

14. CAS( , last, p );

}

Tail

last->next

Tail

void enq(T value ) {

NODE *next , *last ;

1. NODE *p = new(NODE, value, NULL) ;

4. while (TRUE) {

5. last = vRead(Tail) ;

  1. next = vRead(last->next);

7. if (last != vRead(Tail)) continue;

  1. if (next != NULL) {
  2. vCAS( , last, next);
  3. continue;
  4. }

12. if (vCAS( , NULL , p)) break ;

13. }

14. vCAS( , last, p );

}

Tail

last->next

Tail

53

54 of 90

Versioned CAS on BSTs

SPTDC 2023 PANAGIOTA FATOUROU

A

B

C

D

E

A

5

5

4

B’

B

3

C

C’

5

D

5

E

root

root

2

1

A’

Snapshottable BST

Expanding out VCAS objects

Timestamps

54

Examples of queries

  • Range queries
  • Tree height
  • Smallest key that matches a condition
  • K-successors
  • Multi-lookup

55 of 90

Multi-point Queries

  • In general, it is not always possible to compute linearizable multi-point queries from low-level snapshots.
  • The vCAS paper formally defines the conditions a data structure needs to satisfy for this approach to work.
  • Most lock-free data structures satisfy this condition.

SPTDC 2023 PANAGIOTA FATOUROU

55

56 of 90

Comparison with Existing Techniques

SPTDC 2023 PANAGIOTA FATOUROU

Generality

Efficiency

LFCA [Winbland et al., SPAA’18]

KiWi [Basil et al., PPoPP’17]

PNB-BST [Fatourou et al., SPPA’19]

SnapTree [Bronson et al., PPoPP’10]

Epoch RQs [Arbel, Raviv et al., PPoPP’18]

SnapCollector [Petrank et al., PPoPP’13]

STM [Fernandez et al., PPoPP’11]

The VCAS Approach, Wei et al.

56

57 of 90

Practical Optimizations

  • Avoiding Indirection
  • Using exponential backoff to reduce contention when accessing the global timestamp
  • Removing redundant versions from the version list
  • Garbage collecting old versions

SPTDC 2023 PANAGIOTA FATOUROU

57

58 of 90

Avoiding Indirection

SPTDC 2023 PANAGIOTA FATOUROU

A

B

C

D

E

A

5

5

4

B’

B

3

C

C’

5

D

5

E

root

root

2

1

A’

Snapshottable BST

Expanding out VCAS objects

Without indirection

A, 2

B, 5

C, 5

D, 5

E, 5

root

A’, 1

B’, 4

C’, 3

Merge version list and

data structure nodes

58

59 of 90

Experimental Evaluation

  • Adding support for multi-point queries on top of existing concurrent lock-free data structures was very easy and required adding fewer than 150 lines of code (in C++).
  • The vCAS approach adds very little overhead to the original data structure
  • The vCAS approach (which is general-purpose) is often as fast as, or faster than, state-of-the-art lock-free data structures supporting range queries.

SPTDC 2023 PANAGIOTA FATOUROU

59

60 of 90

Summary of vCAS Technique

  • vCAS is an approach for adding snapshotting and multi-point queries to existing concurrent data structures
    • Easy-to-use: simply replace CAS with Versioned CAS
    • Efficient: both theoretically and practically
    • General: supports a wide range of data structures and multi-point queries
  • Code is available on GitHub: https://github.com/yuanhaow/vcaslib
  • Full version (with full proof of correctness & DS characterization for supporting multi-point queries) is available on arxiv: �https://arxiv.org/abs/2007.02372

SPTDC 2023 PANAGIOTA FATOUROU

60

61 of 90

Multi-Version Garbage Collection�

ANY SYSTEM THAT MAINTAINS MULTIPLE VERSIONS OF EACH OBJECT NEEDS A WAY OF EFFICIENTLY RECLAIMING THEM!

SPTDC 2023 PANAGIOTA FATOUROU

61

62 of 90

Research Question

SPTDC 2023 PANAGIOTA FATOUROU

How do we garbage collect, efficiently, for multiversion data structures?

62

63 of 90

Ben-David, Blelloch, Fatourou, Ruppert, Wei, DISC 2021

A general Multiversion Garbage Collection (GC) scheme with the following properties:

    • Progress: wait-free
    • Time: O(1) per reclaimed version, on average
    • Space: constant factor more versions than needed, plus an additive term

SPTDC 2023 PANAGIOTA FATOUROU

Previous solutions either use:

    • unbounded space [Fernandes et al., PPoPP’11] , or
    • O(P) time per reclaimed version [Lu et al. DISC’13] [Böttcher et al., VLDB’19]
      • P: number of processes

63

64 of 90

Multiversion Garbage Collection (MVGC)

  • How do we identify which versions are not needed?
  • How do we safely reclaim them?

SPTDC 2023 PANAGIOTA FATOUROU

X

Y

Objects

Versions

Maintaining all old versions high memory usage

64

time

65 of 90

Which Versions are Needed?

SPTDC 2023 PANAGIOTA FATOUROU

Time

X

Y

Objects

Versions

Timestamps of multipoint queries

Most recent versions needed

Versions needed by read-only operations

65

66 of 90

Related Work – Epoch-Based Solutions

  • Reclaim versions overwritten before the start of the oldest read-only operation

SPTDC 2023 PANAGIOTA FATOUROU

X

Operations started before this point have completed

Safe to collect

66

67 of 90

Related Work – Epoch-Based Solutions

  • Cons: High space usage
    • Unable to collect newer obsolete versions
    • Particularly bad with long read-only operations
        • E.g. database scans, large range queries
    • Paused process can lead to unbounded space usage

Pros: Fast, easy to implement

SPTDC 2023 PANAGIOTA FATOUROU

X

67

68 of 90

Related Work – Other Solutions

  • Techniques have been developed to address shortcomings of epoch-based solutions.
    • GMV [Lu et al. DISC’13], Hana [Lee et al. SIGMOD’16], Steam [Böttcher et al. VLDB’19]
    • Require Ω(P) time, on average, to collect each version in worst case executions.
      • P: number of processes
    • Keep up to P times more versions than necessary

SPTDC 2023 PANAGIOTA FATOUROU

68

69 of 90

What is the problem to solve?

Step 1: Identify obsolete versions

Step 2: Unlink from version list

Step 3: Reclaim memory of unlinked versions

SPTDC 2023 PANAGIOTA FATOUROU

69

70 of 90

Step 1: Identify obsolete versions

SPTDC 2023 PANAGIOTA FATOUROU

X

Y

70

Range Tracker

O(1) amortized time

O(needed) space

Unneeded Versions

Active multi-point queries

Obsolete Versions

71 of 90

Step 2: Unlink from version list

SPTDC 2023 PANAGIOTA FATOUROU

X

Y

n

unneeded

Need parent to unlink

We present a wait-free, amortized O(1) algorithm for remove()

71

72 of 90

Step 3: Reclaim memory of unlinked versions

SPTDC 2023 PANAGIOTA FATOUROU

X

n

P1

  • n is not safe to reclaim right away because a thread (P1) could be paused to access it
  • Using Hazard Pointers (HP) or Concurrent Reference Counting (CRC) would solve this problem, but
    • HP sacrifices wait-freedom
    • CRC sacrifices space bounds
  • Ben-David et al. presents a new safe reclamation scheme specifically for the doubly-linked version list implementation it provides

72

73 of 90

Step 1: Identifying Obsolete Versions

SPTDC 2023 PANAGIOTA FATOUROU

Range tracker

X

Y

x1

x4

x6

y2

y4

x7

y8

x5

y3

x2

y1

x3

y7

x6

Triplet [version, beginTS, endTS]

73

3

7

Announcement Array

deprecate(z5, startTS, endTS)

z5

Add new triplet & return obsolete versions

Amortized O(1) time

Observation: Triplets added by the same process have increasing endTS.

74 of 90

Step 1: Identifying Obsolete Versions

Question: Given a triplet T and a sorted announcement array, how long does it take to check if T is obsolete?

SPTDC 2023 PANAGIOTA FATOUROU

0

3

4

5

9

20

Sorted Announcement Array

z5

length = P (i.e. number of processes)

Triplet

Z5, 7, 12

=

Binary search for endTS

Compare with startTS

O(log P) time

74

75 of 90

Step 1: Identifying Obsolete Versions

Question: What if you were given a batch of O(P) triplets sorted by end timestamp?

SPTDC 2023 PANAGIOTA FATOUROU

1

3

4

5

9

20

Sorted Announcement Array

length = P (i.e. number of processes)

Batch of O(P) Triplets

Z5, 2, 8

X3, 7, 12

Y7, 10, 18

Merge()

O(P) time

Takes O(P) time

O(P log P) time

Takes O(P log P) time

Batch of O(P log P) Triplets

75

76 of 90

Range Tracker: Implementation

SPTDC 2023 PANAGIOTA FATOUROU

P1

Local list of triplets

Shared Queue

7

3

20

9

5

Announcement Array

Push()

Size at most O(P log P)

Pop() x2

Merge

Scan & Sort

Needed

Obsolete

Returned by deprecate

76

77 of 90

Step 2: Unlink from version list

SPTDC 2023 PANAGIOTA FATOUROU

X

Y

n

Obsolete

Concurrent removes

77

78 of 90

Concurrent Removes

SPTDC 2023 PANAGIOTA FATOUROU

A

B

C

D

A

B

C

D

Remove(B)

Remove(C)

Linked list structure corrupted

78

79 of 90

Concurrent Doubly Linked List, BBF+

  • TryAppend(oldHead, newHead)
      • Appends newHead only if current head equals oldHead
      • Returns true if successful

  • Remove(pointer to node)

SPTDC 2023 PANAGIOTA FATOUROU

79

80 of 90

BBF+: Linked List Remove

SPTDC 2023 PANAGIOTA FATOUROU

4

3

4

2

4

3

4

4

1

3

4

2

4

3

Implicitly defined tree

Version list

4

1

2

2

3

3

3

3

4

4

4

4

4

4

4

4

x

y

z

4

Marked for deletion

3

4

4

3

Amortized O(1) time

Worst case O(log L) time

L = # of nodes appended to version list

80

81 of 90

Space Overhead

SPTDC 2023 PANAGIOTA FATOUROU

4

3

4

2

4

1

3

4

2

4

3

Implicitly defined tree

Version list

4

1

2

2

3

3

3

4

4

4

4

4

4

3

2

1

2

3

Marked for deletion

O(log L) factor space overhead!

L = # of nodes appended to version list

81

82 of 90

Splicing Out Internal Nodes

SpliceUnmarkedLeft(Y): requires X > Y > Z and X unmarked

SPTDC 2023 PANAGIOTA FATOUROU

SpliceUnmarkedRight(Y): requires X < Y < Z and Z unmarked

X

Y

Z

May or may not be marked

X

Y

Z

X

Y

Z

X

Y

Z

82

83 of 90

BBF+: Space and Time

SPTDC 2023 PANAGIOTA FATOUROU

TryAppend

(worst-case)

Remove

(amortized)

Remove

(worst-case)

Space

BBF+

O(1)

O(1)

O(log L),

Wait-free

O(S + c log L)

L = # of successful TryAppends

S = # of nodes appended but not removed

c = # of ongoing remove operations

83

84 of 90

Overall Results

  • Time bounds:
    • O(1) time, on average, to identify, remove, and reclaim a version
    • Wait-free
  • Space bounds:
    • Number of unreclaimed versions O(# required versions) + additive term

Full version (with proof of correctness) available on arxiv:

https://arxiv.org/abs/2108.02775

SPTDC 2023 PANAGIOTA FATOUROU

84

85 of 90

New MVGC Schemes �[Wei, Blelloch, Fatourou, Ruppert, PPoPP 2023]

  • Use range tracker to get good space efficiency
  • Time efficiency: BBF+ is over optimized for worst-case

SPTDC 2023 PANAGIOTA FATOUROU

Concurrent remove()s

  • DL-RT: Range tracker + new doubly-linked version list
  • SL-RT: Range tracker + new singly-linked version list

85

86 of 90

Results

  • Two new MVGC schemes:
    • Fast and space efficient in practice
    • Strong space bounds in theory
  • Full paper (with proofs of correctness) is available on arxiv: https://arxiv.org/abs/2212.13557
  • Code is available on GitHub: �https://github.com/cmuparlay/ppopp23-mvgc

SPTDC 2023 PANAGIOTA FATOUROU

86

87 of 90

Conclusions

  • The vCAS Approach
  • Simple, constant-time approach to take a snapshot of a collection of CAS objects.
  • Technique to use snapshots to implement linearizable multi-point queries in many lock-free data structures.
  • Adding snapshots to a CAS-based data structure preserves the data structures’ asymptotic time bounds.
  • Every read is completed within a finite number of instructions (i.e. it is wait-free).

SPTDC 2023 PANAGIOTA FATOUROU

87

88 of 90

Conclusions

  • We present theoretically efficient solutions to the MVGC problem
  • Developed new techniques for all 3 steps:
    1. Identify obsolete versions
    2. Unlink from version list
    3. Reclaim memory of unlinked versions

  • The MVGC schemes:
    • Provide strong space and time bounds in theory.
    • Space and time efficient in practice.

SPTDC 2023 PANAGIOTA FATOUROU

88

89 of 90

Thank You!

SPTDC 2023 PANAGIOTA FATOUROU

89

This work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the “Second Call for HFRI Research Projects to support Faculty Members and Researchers” (project number: 3684, project acronym: PERSIST).

90 of 90

We are recruiting!