1 of 67

CS 5/6110, Software Correctness Analysis, Spring 2023

Ganesh Gopalakrishnan

School of Computing

University of Utah

Salt Lake City, UT 84112

2 of 67

Outline

  • Coherence
  • Complexity of coherence-checking of a trace
    • It is NP-complete
      • Good to understand the “hardness” of a coherence assertion
        • Trace-verification per trace is hard to exhaust …
          • But even if we do, it can be complex
      • Real verification : see below
  • Coherence protocol implementation
    • Implementations are horrendously complex (90 pages of TLA+ rules which expands to 3x that many pages of Murphi rules – at least)
      • 30 message types , handling orphaned messages , etc
    • ALL for speed
      • Else your phone will respond to your button-push of today tomorrow ☺
  • ”meta-circular assume/guarantee” of parameterized cache coherence protocols that can be practiced within Murphi
    • General param verif problem is undecidable
      • Reduction from at TM that halts in k-steps when started on an empty tape to a ring of k automata
  • How the CMP method looks like
  • This brings us to the topic of writing parameterized protocols using Murphi’s scalar sets
  • How to write one such model for our favorite locking protocol
    • And how verification scales with and without scalar sets
    • And how ROMP is able to make a big dent (ROMP is our random-walk Murphi)

3 of 67

What is memory coherence?

  • A view of shared memory from the point of view of
    • One address
    • More than one thread
    • It is basically sequential consistency, but for a memory system with exactly one address
      • What is sequential consistency? [Lamport’78]

  • Different from sequential consistency?
    • Yes!
      • An execution like this

4 of 67

An example demonstrating the diff starkly

P0 P1 (“P” = processes or shared-memory threads)

Initially a==b==0

Store(a,1); Store(b,2);

Load(b,0);//returns 0 Load(a,0);//returns 0

If you wear a pair of goggles that just sees black or just sees brown

There is an “SC explanation” -- try it

But if you see both colors together, then no SC explanation

Hence coherent (one address has always the latest value, since the real-time order of local operations is irrelevant at the global level)

5 of 67

Complexity of trace-verification for coherence

Suppose someone gives you a trace

Px:S(a,0); Py:L(a,1); Pz:S(a,33); Pz: (a,44); Px: S(a,3); ….

Where the P are processes (or threads) that share the SINGLE address a

And each Px: …. ; …. ; Px: … is listed in program order per Px

With the other Py , y ~= x listed according to some interleaving

Then what is the complexity of checking that such a trace and declaring whether it is coherent or not?

6 of 67

Complexity of trace-verification for coherence

NP-Complete!

7 of 67

Where is a good (accessible) proof? – ANS : Cantin’s paper

8 of 67

Where is a good (accessible) proof? – ANS : Cantin’s paper

Given four processes (or hardware threads) and reads/writes

Per location (“a_u” in this case) explain read-value outcomes.

Here we explain as if this interleaving took place.

The inability find such an explanation means the system is incoherent

Cool piece of encoding by Cantin: Given a 3SAT formula, he generates a shared memory execution such that the formula is SAT iff the execution is coherent!

9 of 67

Cantin’s reduction from SAT (general SAT which is = 3SAT wrt complexity)

Recall or know

That

2-sat

Is P-time

(= 2-coloring)

10 of 67

Recap / Summary

  • To define memory views as shared by multiple processors, we need to define a formal shared memory consistency model
  • Coherence is one of the basic models
    • Each location has a latest data that every reader agrees on
    • Also known as “per-location sequential consistency”
  • There are some interesting complexity results that help us understand coherence
    • Given a “finished execution trace”, checking coherence is NP-Complete

11 of 67

Implementing coherence

12 of 67

Coherence Verification

  • Given the complexity of coherence protocols, formal methods (model-checking mainly) is essential
  • Let us take a look at an academic protocol called The German Protocol
  • How does coherence verification scale?
    • Not well – today, large protocols take days to cover for one bit of data and 3 processors
    • Solutions
      • Derive cutoff bounds – Emerson and Kahlon
        • The bounds are large (7-8) and automatically computing bounds is not practical for large protocols
      • Do a parametric verification
        • Prove coherence for all ”N” – N is the number of cores/threads
        • This involves modeling 2-3 nodes explicitly and involves a manual abstraction/refinement loop – called CEGAR
          • Counter-Example Guided Abstraction Refinement
        • Involves designer input of non-interference lemmas
      • Do formal synthesis
        • Prof. Vijay Nagarajan will be presenting this when he joins us!

13 of 67

Basics about Transition Systems

  • Before we study the German protocol and how the parametric verification method I’m going to present works, let us discuss basic concepts about formal transition systems

14 of 67

Almost all specs for safety property checking look like this

  • Based on Guarded Commands

Rule1: g1 ==> a1

Rule2: g2 ==> a2

RuleN: gN ==> aN

Invariant P

  • Supported by tools such as Murphi (Stanford, Dill’s group)
  • Presents the behavior declaratively
    • Good for specifying “message packet” driven behaviors
    • Sequentially dependent actions can be strung using guards
  • “Rule Sets” can specify behaviors across axes of symmetry
    • Processors, memory locations, etc.
  • Simple and Universally Understood Semantics

14

15 of 67

Let us understand how we may safely transform such rule-based specifications (this is what we have to do in the parametric verification approach to be presented). The method is called the CMP method named after its inventors at Intel.

15

16 of 67

The name of the game – “invariants” !!

  • Much like loop invariants, except
    • These are invariants pertaining to formal state-transition systems
  • We are really talking about inductive invariants
    • Invariants P such that
      • They are true in the initial state
      • When held at a state, every transition rule preserves it
  • This is a hugely important topic (inductive invariant discovery)
    • Then you can do program-proofs by walking each path or transition rule!

17 of 67

Invariants, Inductive Invariants

  • Consider the transition system
    • Init : X == 1 (X is an int var … i.e. +, 0, - )
    • Tr1 : true 🡺 x += 2
    • Tr2 : true => x += 3
    • An invariant is x != 0
    • But is it inductive?
      • (X != 0) ==?==Tr1🡺 (X!=0) ?
      • (X != 0) ==?==Tr2🡺 (X!=0)?

18 of 67

Invariants, Inductive Invariants

  • Consider the transition system
    • Init : X == 1 (X is an int var … i.e. +, 0, - )
    • Tr1 : true 🡺 x += 2
    • Tr2 : true => x += 3
    • An invariant is x != 0
    • But is it inductive?
      • (X != 0) ==?==Tr1🡺 (X!=0) ?
      • (X != 0) ==?==Tr2🡺 (X!=0)?
    • No, for X == -2, it is not true!
      • So what is one inductive invariant for this transition system?

19 of 67

Invariants, Inductive Invariants

  • Consider the transition system
    • Init : X == 1 (X is an int var … i.e. +, 0, - )
    • Tr1 : true 🡺 x += 2
    • Tr2 : true => x += 3
    • An invariant is x != 0
    • But is it inductive?
      • (X != 0) ==?==Tr1🡺 (X!=0) ?
      • (X != 0) ==?==Tr2🡺 (X!=0)?
    • No, for X == -2, it is not true!
      • So what is one inductive invariant for this transition system?
        • How about X >= 0 ?
        • How about X >= -1 ?
        • How about X >= 1 ?
      • What is the strongest inductive invariant of the above three?
      • What is the strongest inductive invariant (period) ?
        • Answer : = reachable state space!

20 of 67

Tools to interactively discover invariants

  • Things like Paxos have been treated – which is nice
  • https://researchr.org/publication/fmcad-2021

21 of 67

Let us understand how we may safely transform such rule-based specifications. The first few will be warmups. Then the real thing!

  • Observation: Weakening a guard is sound
  • Suppose we add a disjunct as below (Cond1) and still manage to show that P is an invariant, then without adding Cond1, the result must still hold

Rule1: g1 \/ Cond1 ==> a1

Rule2: g2 ==> a2

Invariant P

  • Reason: Rule1 fires more often with Cond1 added!
  • May get false alarms (P may fail if Rule1 fires spuriously)
  • For many “weak properties” P, we can “get away” by guard weakening
    • This is a standard abstraction, first proposed by Kurshan (E.g. removing a module that is driving this module, letting inputs “dangle”)
  • BUT in the CMP method, we won’t do this – rather we will do guard strengthening!
  • Except it is useful to know this disjunction property in thinking about certain steps of CMP

21

22 of 67

But… Guard Strengthening is, by itself, Unsound

  • Strengthening a guard is not sound

Rule1: g1 /\ Cond1 ==> a1

Rule2: g2 ==> a2

Invariant P

  • Reason: Rule1 fires only when g1 /\ Cond1
  • So, less behaviors examined in checking P
    • Thus, verifying _with_ Cond1 means nothing for verification without Cond1
  • But hang on, there is more in the CMP method ☺

22

23 of 67

Guard Strengthening can be made sound, if the conjunct is implied by the guard

  • This is sound

Rule1: g1 /\ Cond1 ==> a1

Rule2: g2 ==> a2

Invariant P /\ (g1 ==> Cond1)

  • Reason: Rule1 fires only when g1 /\ Cond1
  • BUT, Cond1 is always implied by g1; we are showing g1🡺Cond1 is an invariant also , so no real loss of states over which Rule1 fires…
    • Call this “Guard Strengthening Supported by Lemma
  • Except, you are showing the invariant in the same modified system!
    • This is fine:
      • Initial state satisfies P, and also (g1 🡺 Cond1). Thus, whenever g1 is true in the initial state, Cond1 is an implied fact. So g1 /\ Cond1 is like g1 by itself.
      • Thus “a1” can be conducted in the initial state , to obtain the next set of states. (No change wrt g2 and a2, so they are like before.)
    • In general
      • At state t (by induction over time), we have P true and (g1 🡺 Cond1) true.
      • Thus , g1, when true, implies Cond1 at time t. Thus g1 /\ Cond1 is like g1 (same truth status) at time t
        • Thus we can obtain the state at t+1 safely via “a1”

23

24 of 67

Summary of Transformations so far (checkmark shows what’s safe)

24

X

25 of 67

The CMP Approach

  • Weaken to the Extreme
  • Then Strengthen Back Just Enough (to pass all properties)

25

26 of 67

Weaken to the Extreme sounds crazy at first!

Rule1: g1 \/ True ==> a1

Rule2: g2 ==> a2

Invariant P

26

The transition system above

can be transformed to the one below without any issues

(except the proof of P being an invariant might not go through) – but that will be fixed momentarily

Rule1: True ==> a1

Rule2: g2 ==> a2

Invariant P

27 of 67

Strengthen Back Some

27

Rule1: True /\ C1 ==> a1

Rule2: g2 ==> a2

Invariant P /\ (g1 => C1)

“Not Enough!” may be

the outcome of strengthening.

That is, while we added C1 back,

it may not be strong-enough.

How to pick C1 will be discussed soon.

28 of 67

Strengthen Back More…

28

Rule1: True /\ C1 /\ C2 ==> a1

Rule2: g2 ==> a2

Invariant P /\ (g1 => C1) /\ (g1 => C2)

“OK, just right!”

Rule1: True /\ C1 ==> a1

Rule2: g2 ==> a2

Invariant P /\ g1 => C1

“Not Enough!”

29 of 67

A Variation of Guard Strengthening Supported by Lemma: �Doing it in a meta-circular manner (i.e., the temporal induction I alluded to earlier…)

29

This is the approach in our work

30 of 67

Now the secret: in the CMP method, the designer decouples all nodes beyond k (typically 2) explicitly modeled nodes. Then, brings back the N-k nodes, but in ‘spirit’ – i.e., in terms of the non-interference conditions they must obey (note that “N” is a free parameter). ��The Ci are thus the non-interference lemmas. Each is discovered upon seeing a counterexample, and then added back into the system! If the coherence invariant is proved (usually this happens), then you ended up having used a model-checker to prove a parametric theorem – which is a big deal!

30

This is the approach in our work

31 of 67

This method has been perfected at Intel and in production use!

32 of 67

We will now present highlights of the German Protocol to tell you how coherence protocols look like (but this is like a “hello world” of cache protocols)

  • See german.m , german.pdf , and abs-german.pdf in the class directory
    • Lec19
    • Lec19/protocols
  • We will note down some highlights in the coming slides

33 of 67

CMP highlights from paper of C-T Chou

34 of 67

CMP highlights from paper of C-T Chou

35 of 67

CMP highlights from paper of C-T Chou

36 of 67

CMP highlights from paper of C-T Chou

The top-left rule is changed to the bottom-right rule in the CMP method, by introducing non-interference lemmas to strengthen the guards. These lemmas are proven in the same model being refined.

37 of 67

German protocol

38 of 67

German protocol

39 of 67

German protocol

40 of 67

German protocol

41 of 67

German protocol

42 of 67

German protocol

43 of 67

German protocol

44 of 67

German protocol

45 of 67

German protocol

46 of 67

German protocol

47 of 67

German protocol

48 of 67

German protocol

49 of 67

German protocol

50 of 67

German protocol

51 of 67

German protocol

52 of 67

Invariants of the German protocol

53 of 67

Locking protocols, Scalar Sets, etc

54 of 67

How to initialize the dist locking protocol w/o breaking symmetry required by scalar sets!

55 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

56 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

57 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

58 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

59 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

60 of 67

Code from dist termination… how initialization is done + how invariants are done (need to generalize)

61 of 67

Snippets from Peterson’s protocol with programmable bug-depth

62 of 67

Snippets from Peterson’s protocol with programmable bug-depth

63 of 67

Snippets from Peterson’s protocol with programmable bug-depth

64 of 67

Snippets from Peterson’s protocol with programmable bug-depth

65 of 67

Snippets from Peterson’s protocol with programmable bug-depth

66 of 67

Snippets from Peterson’s protocol with programmable bug-depth

67 of 67

Snippets from Peterson’s protocol with programmable bug-depth