1 of 25

CSE 452 ― Discussion Section 3: �git and Other Things

2 of 25

Lab 1 Recap

  • Most teams did very well.�
  • It only gets harder from here, though. Read through my notes on Piazza.

3 of 25

Lab 2

Questions?

4 of 25

5 of 25

Weak Consistency/Disconnected Operation

Motivation

How can we support disconnected or weakly connected operation?

Applications

  • File synchronization across users / devices (e.g. Dropbox)
  • Source code control (e.g. git)
  • Intermittent connectivity (e.g. laptops, mobile, resource poor settings)

6 of 25

Answer in One Paper

Terry et al. “Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System.”

Think of it as a generalization of all the examples from the previous slide.

7 of 25

Consistency

Serializability/Linearizability/Sequential Consistency:�very “strong” consistency guarantees; essentially, the system appears to the clients as a single, correct process

Eventual Consistency:�much weaker guarantee; if all updates stop, then eventually all processes will converge to the same state��“Eventual consistency is no consistency”

8 of 25

Dealing With Eventually

We can often do better than “eventual” consistency, but if you ever want to allow disconnected operation, there are a certain set of problems that always come up.��What happens when a disconnected client updates some data, and another disconnected client updates the same data in a conflicting way, and they both come back online?��Hmm……. That problem sounds familiar.

9 of 25

Source Code Control

  • Disconnected Operation
    • Okay to read/write cached copy (different versions)
    • Check if it’s okay later, recover if necessary
  • Track history (with metadata)
  • Concurrent editing / Many contributors
  • Working copy: Don’t want files to change beneath you
    • Push/pull to server/peers
  • Cheap Branches / Merging
  • Centralized vs. Distributed

10 of 25

CVS (1990)

  • Client-server architecture
    • Check out a working copy
    • Check in your changes
  • Server arbitrates order of changes
    • Only accept changes to the most recent version of a file
    • Developers must always keep their files up to date

11 of 25

CVS Workflow

12 of 25

CVS Workflow

13 of 25

CVS Workflow

14 of 25

CVS Workflow

15 of 25

CVS Limitations

  • Everyone is editing the same repository
    • How do you implement a complex feature without constantly conflicting?
  • No local version control
    • cvs commit ~ git commit && git push
  • No log
    • How do we “time travel” and see history
  • No versioning of moving/renaming files
  • Depends on live server to operate
  • Branches were expensive, manual locks are common
  • No atomicity (i.e., network failure lead to inconsistency)

16 of 25

Apache SVN (2000)

“CVS done right”

Improvements:

  • atomic commits
  • renamed/moved/copied files retained version history
  • versioning of directories and metadata
  • cheap branches/tagging (though only by conventions)

Still active; all of Facebook’s source code was in a single SVN repo until 2014

17 of 25

git (2005)

  • Distributed!
    • Everyone is a replica
  • Cheap branches/merging
  • .git/
    • config
    • content-addressable filesystem
    • refs (logs of changes)

18 of 25

Content-Addressable Filesystem

  • Glorified hash-set (on disk)
  • Almost everything git stores is an “object”; can be loose or packed

./git/objects/pack:

19 of 25

Logs (Commit Histories)

  • Complete log of changes (allows “time travel” through source history)
    • Directed acyclic graph (DAG)�
  • commit
    • parent(s)
    • pointer to tree object (root directory)
    • metadata (e.g. author name, commit message)

20 of 25

An Example

$ git init

$ echo “version 1” > test.txt

$ git add test.txt

$ git commit -m “first commit”

$ echo “version 2” > test.txt

$ echo “new file” > new.txt

$ git add ./

$ git commit -m “second commit”

$ mkdir bak

$ echo “version 1” > bak/test.txt

$ git add bak/

$ git commit -m “third commit”

21 of 25

22 of 25

Commit “abc”

Commit “def”

Commit “ghi”

Commit �“jkl”

S <0, 0>

S <1, 0>

S <0, 1>

S <1, 1>

23 of 25

24 of 25

Reconciling Conflicts

  • Last writer wins (AFS, NFS)
  • User-specified conflict handler
  • Manual reconciliation (git, svn)
  • Conflict-free replicated data types (CRDT)
    • Data types wherein conflicts are impossible
    • Operation-based CRDT; a.k.a. commutative replicated data types (e.g., an add-only set)

25 of 25

Want More git Goodness?