1 of 67

CS 5/6110, Software Correctness Analysis, Spring 2023

Ganesh Gopalakrishnan

School of Computing

University of Utah

Salt Lake City, UT 84112

2 of 67

Outline

Coherence
Complexity of coherence-checking of a trace

It is NP-complete

Good to understand the “hardness” of a coherence assertion

Trace-verification per trace is hard to exhaust …

But even if we do, it can be complex

Real verification : see below

Coherence protocol implementation

Implementations are horrendously complex (90 pages of TLA+ rules which expands to 3x that many pages of Murphi rules – at least)

30 message types , handling orphaned messages , etc

ALL for speed

Else your phone will respond to your button-push of today tomorrow ☺

”meta-circular assume/guarantee” of parameterized cache coherence protocols that can be practiced within Murphi

General param verif problem is undecidable

Reduction from at TM that halts in k-steps when started on an empty tape to a ring of k automata

How the CMP method looks like
This brings us to the topic of writing parameterized protocols using Murphi’s scalar sets
How to write one such model for our favorite locking protocol

And how verification scales with and without scalar sets
And how ROMP is able to make a big dent (ROMP is our random-walk Murphi)

3 of 67

What is memory coherence?

A view of shared memory from the point of view of

One address
More than one thread
It is basically sequential consistency, but for a memory system with exactly one address

What is sequential consistency? [Lamport’78]

Very good descriptions here: https://en.wikipedia.org/wiki/Sequential_consistency

Different from sequential consistency?

Yes!

An execution like this

4 of 67

An example demonstrating the diff starkly

P0 P1 (“P” = processes or shared-memory threads)

Initially a==b==0

Store(a,1); Store(b,2);

Load(b,0);//returns 0 Load(a,0);//returns 0

If you wear a pair of goggles that just sees black or just sees brown

There is an “SC explanation” -- try it

But if you see both colors together, then no SC explanation

Hence coherent (one address has always the latest value, since the real-time order of local operations is irrelevant at the global level)

5 of 67

Complexity of trace-verification for coherence

Suppose someone gives you a trace

Px:S(a,0); Py:L(a,1); Pz:S(a,33); Pz: (a,44); Px: S(a,3); ….

Where the P are processes (or threads) that share the SINGLE address a

And each Px: …. ; …. ; Px: … is listed in program order per Px

With the other Py , y ~= x listed according to some interleaving

Then what is the complexity of checking that such a trace and declaring whether it is coherent or not?

6 of 67

Complexity of trace-verification for coherence

NP-Complete!

7 of 67

Where is a good (accessible) proof? – ANS : Cantin’s paper

8 of 67

Where is a good (accessible) proof? – ANS : Cantin’s paper

Given four processes (or hardware threads) and reads/writes

Per location (“a_u” in this case) explain read-value outcomes.

Here we explain as if this interleaving took place.

The inability find such an explanation means the system is incoherent

Cool piece of encoding by Cantin: Given a 3SAT formula, he generates a shared memory execution such that the formula is SAT iff the execution is coherent!

9 of 67

Cantin’s reduction from SAT (general SAT which is = 3SAT wrt complexity)

Recall or know

That

2-sat

Is P-time

(= 2-coloring)

10 of 67

Recap / Summary

To define memory views as shared by multiple processors, we need to define a formal shared memory consistency model
Coherence is one of the basic models

Each location has a latest data that every reader agrees on
Also known as “per-location sequential consistency”

There are some interesting complexity results that help us understand coherence

Given a “finished execution trace”, checking coherence is NP-Complete

Very insightful proof by Jason Cantin et al
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1435343

11 of 67

Implementing coherence

Snoopy-bus protocols (still present at smaller scales)
Directory-based protocols (more scalable)
See https://www.morganclaypool.com/doi/pdf/10.2200/S00962ED2V01Y201910CAC049 for Thu’s colloq speaker’s book

12 of 67

Coherence Verification

Given the complexity of coherence protocols, formal methods (model-checking mainly) is essential
Let us take a look at an academic protocol called The German Protocol
How does coherence verification scale?

Not well – today, large protocols take days to cover for one bit of data and 3 processors
Solutions

Derive cutoff bounds – Emerson and Kahlon

The bounds are large (7-8) and automatically computing bounds is not practical for large protocols

Do a parametric verification

Prove coherence for all ”N” – N is the number of cores/threads
This involves modeling 2-3 nodes explicitly and involves a manual abstraction/refinement loop – called CEGAR

Counter-Example Guided Abstraction Refinement

Involves designer input of non-interference lemmas

Do formal synthesis

Prof. Vijay Nagarajan will be presenting this when he joins us!

13 of 67

Basics about Transition Systems

Before we study the German protocol and how the parametric verification method I’m going to present works, let us discuss basic concepts about formal transition systems

14 of 67

Almost all specs for safety property checking look like this

Based on Guarded Commands

Rule1: g1 ==> a1

Rule2: g2 ==> a2

…

RuleN: gN ==> aN

Invariant P

Supported by tools such as Murphi (Stanford, Dill’s group)
Presents the behavior declaratively

Good for specifying “message packet” driven behaviors
Sequentially dependent actions can be strung using guards

“Rule Sets” can specify behaviors across axes of symmetry

Processors, memory locations, etc.

Simple and Universally Understood Semantics

14

15 of 67

Let us understand how we may safely transform such rule-based specifications (this is what we have to do in the parametric verification approach to be presented). The method is called the CMP method named after its inventors at Intel.

15

16 of 67

The name of the game – “invariants” !!

Much like loop invariants, except

These are invariants pertaining to formal state-transition systems

We are really talking about inductive invariants

Invariants P such that

They are true in the initial state
When held at a state, every transition rule preserves it

This is a hugely important topic (inductive invariant discovery)

Then you can do program-proofs by walking each path or transition rule!

17 of 67

Invariants, Inductive Invariants

Consider the transition system

Init : X == 1 (X is an int var … i.e. +, 0, - )
Tr1 : true 🡺 x += 2
Tr2 : true => x += 3
An invariant is x != 0
But is it inductive?

(X != 0) ==?==Tr1🡺 (X!=0) ?
(X != 0) ==?==Tr2🡺 (X!=0)?

18 of 67

Invariants, Inductive Invariants

Consider the transition system

Init : X == 1 (X is an int var … i.e. +, 0, - )
Tr1 : true 🡺 x += 2
Tr2 : true => x += 3
An invariant is x != 0
But is it inductive?

(X != 0) ==?==Tr1🡺 (X!=0) ?
(X != 0) ==?==Tr2🡺 (X!=0)?

No, for X == -2, it is not true!

So what is one inductive invariant for this transition system?

19 of 67

Invariants, Inductive Invariants

Consider the transition system

Init : X == 1 (X is an int var … i.e. +, 0, - )
Tr1 : true 🡺 x += 2
Tr2 : true => x += 3
An invariant is x != 0
But is it inductive?

(X != 0) ==?==Tr1🡺 (X!=0) ?
(X != 0) ==?==Tr2🡺 (X!=0)?

No, for X == -2, it is not true!

So what is one inductive invariant for this transition system?

How about X >= 0 ?
How about X >= -1 ?
How about X >= 1 ?
…

What is the strongest inductive invariant of the above three?
What is the strongest inductive invariant (period) ?

Answer : = reachable state space!

20 of 67

Tools to interactively discover invariants

The I4 tool, and this hotos paper

https://web.eecs.umich.edu/~manosk/assets/papers/i4-hotos19.pdf

https://web.eecs.umich.edu/~manosk/projects.html
https://web.eecs.umich.edu/~manosk/assets/papers/i4-hotos19-slides.pdf

Things like Paxos have been treated – which is nice
https://researchr.org/publication/fmcad-2021

21 of 67

Let us understand how we may safely transform such rule-based specifications. The first few will be warmups. Then the real thing!

Observation: Weakening a guard is sound
Suppose we add a disjunct as below (Cond1) and still manage to show that P is an invariant, then without adding Cond1, the result must still hold

Rule1: g1 \/ Cond1 ==> a1

Rule2: g2 ==> a2

Invariant P

Reason: Rule1 fires more often with Cond1 added!
May get false alarms (P may fail if Rule1 fires spuriously)
For many “weak properties” P, we can “get away” by guard weakening

This is a standard abstraction, first proposed by Kurshan (E.g. removing a module that is driving this module, letting inputs “dangle”)

BUT in the CMP method, we won’t do this – rather we will do guard strengthening!
Except it is useful to know this disjunction property in thinking about certain steps of CMP

21

22 of 67

But… Guard Strengthening is, by itself, Unsound

Strengthening a guard is not sound

Rule1: g1 /\ Cond1 ==> a1

Rule2: g2 ==> a2

Invariant P

Reason: Rule1 fires only when g1 /\ Cond1
So, less behaviors examined in checking P

Thus, verifying _with_ Cond1 means nothing for verification without Cond1

But hang on, there is more in the CMP method ☺

22

23 of 67

Guard Strengthening can be made sound, if the conjunct is implied by the guard

This is sound

Rule1: g1 /\ Cond1 ==> a1

Rule2: g2 ==> a2

Invariant P /\ (g1 ==> Cond1)

Reason: Rule1 fires only when g1 /\ Cond1
BUT, Cond1 is always implied by g1; we are showing g1🡺Cond1 is an invariant also , so no real loss of states over which Rule1 fires…

Call this “Guard Strengthening Supported by Lemma”

Except, you are showing the invariant in the same modified system!

This is fine:

Initial state satisfies P, and also (g1 🡺 Cond1). Thus, whenever g1 is true in the initial state, Cond1 is an implied fact. So g1 /\ Cond1 is like g1 by itself.
Thus “a1” can be conducted in the initial state , to obtain the next set of states. (No change wrt g2 and a2, so they are like before.)

In general

At state t (by induction over time), we have P true and (g1 🡺 Cond1) true.
Thus , g1, when true, implies Cond1 at time t. Thus g1 /\ Cond1 is like g1 (same truth status) at time t

Thus we can obtain the state at t+1 safely via “a1”

23

24 of 67

Summary of Transformations so far (checkmark shows what’s safe)

24

✔

X

25 of 67

The CMP Approach

Weaken to the Extreme
Then Strengthen Back Just Enough (to pass all properties)

25

26 of 67

Weaken to the Extreme sounds crazy at first!

Rule1: g1 \/ True ==> a1

Rule2: g2 ==> a2

Invariant P

26

The transition system above

can be transformed to the one below without any issues

(except the proof of P being an invariant might not go through) – but that will be fixed momentarily

Rule1: True ==> a1

Rule2: g2 ==> a2

Invariant P

27 of 67

Strengthen Back Some

27

Rule1: True /\ C1 ==> a1

Rule2: g2 ==> a2

Invariant P /\ (g1 => C1)

“Not Enough!” may be

the outcome of strengthening.

That is, while we added C1 back,

it may not be strong-enough.

How to pick C1 will be discussed soon.

28 of 67

Strengthen Back More…

28

Rule1: True /\ C1 /\ C2 ==> a1

Rule2: g2 ==> a2

Invariant P /\ (g1 => C1) /\ (g1 => C2)

“OK, just right!”

Rule1: True /\ C1 ==> a1

Rule2: g2 ==> a2

Invariant P /\ g1 => C1

“Not Enough!”

29 of 67

A Variation of Guard Strengthening Supported by Lemma: �Doing it in a meta-circular manner (i.e., the temporal induction I alluded to earlier…)

29

✔

This is the approach in our work

30 of 67

Now the secret: in the CMP method, the designer decouples all nodes beyond k (typically 2) explicitly modeled nodes. Then, brings back the N-k nodes, but in ‘spirit’ – i.e., in terms of the non-interference conditions they must obey (note that “N” is a free parameter). ��The Ci are thus the non-interference lemmas. Each is discovered upon seeing a counterexample, and then added back into the system! If the coherence invariant is proved (usually this happens), then you ended up having used a model-checker to prove a parametric theorem – which is a big deal!

30

✔

This is the approach in our work

31 of 67

This method has been perfected at Intel and in production use!

See https://www.cs.utexas.edu/~hunt/FMCAD/FMCAD09/slides/talupur.pdf
https://dl.acm.org/doi/pdf/10.5555/1517424.1517434
Designers write “protocol flow” diagrams as part of standard documentation

These are mined to obtain the non-interference lemmas

See http://formalverification.cs.utah.edu/presentations/fmcad04_tutorial2/chou/ctchou-tutorial.pdf for details of how this is done
NOTE the table-style specification recommended in the above tutorial at fmcad04 !!

32 of 67

We will now present highlights of the German Protocol to tell you how coherence protocols look like (but this is like a “hello world” of cache protocols)

See german.m , german.pdf , and abs-german.pdf in the class directory

Lec19
Lec19/protocols

We will note down some highlights in the coming slides

33 of 67

CMP highlights from paper of C-T Chou

34 of 67

CMP highlights from paper of C-T Chou

35 of 67

CMP highlights from paper of C-T Chou

36 of 67

CMP highlights from paper of C-T Chou

The top-left rule is changed to the bottom-right rule in the CMP method, by introducing non-interference lemmas to strengthen the guards. These lemmas are proven in the same model being refined.

37 of 67

German protocol

38 of 67

German protocol

39 of 67

German protocol

40 of 67

German protocol

41 of 67

German protocol

42 of 67

German protocol

43 of 67

German protocol

44 of 67

German protocol

45 of 67

German protocol

46 of 67

German protocol

47 of 67

German protocol

48 of 67

German protocol

49 of 67

German protocol

50 of 67

German protocol

51 of 67

German protocol

52 of 67

Invariants of the German protocol

53 of 67

Locking protocols, Scalar Sets, etc

54 of 67

How to initialize the dist locking protocol w/o breaking symmetry required by scalar sets!

55 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

56 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

57 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

58 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

59 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

60 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

61 of 67