CS61B: 2021
Lecture 12: Command Line Programming and Data Structures Preview
Random 61C Preview
Decimal, Binary, Hexadecimal
Before we start today, I’d like to cover a concept you’ll go over in 61C.
In the decimal number system, we have digits 0123456789.
In the binary number system, we have digits 01.
In the hexadecimal number system, we have digits 0123456789abcdef.
Decimal, Binary, Hexadecimal
The numbers 932, 1110100100, and 3a4 are all the same exact number.
Hexadecimal is often used in computer science in lieu of decimal.
Today, we’ll use hexadecimal to represent large numbers.
Command Line Compilation
Compilation
The standard tools for executing Java programs use a two step process:
Hello.java
Hello.class
javac
java
stuff
happens
Compiler
Interpreter
In our course so far we’ve been using Intellij, which uses javac and java.
However, it is also possible to manually invoke javac and java ourselves.
public static void main(String[] args)
One Special Role for Strings: Command Line Arguments
public class ArgsDemo {
/** Prints out the 0th command line argument. */
public static void main(String[] args) {
System.out.println(args[0]);
}
}
jug ~/Dropbox/61b/lec/gitletIntro
$ java ArgsDemo hello some args
hello
ArgsSum Exercise
Goal: Create a program ArgsSum that prints out the sum of the command line arguments, assuming they are numbers.
One Special Role for Strings: Command Line Arguments
public class ArgsSum {
/** Prints out the sum of arguments, assuming they are
* integers.
*/
public static void main(String[] args) {
int index = 0;
int sum = 0;
while (index < args.length) {
sum = sum + Integer.parseInt(args[index]);
index = index + 1;
}
System.out.println(sum);
}
}
How’d we know to do this? We Googled “convert string integer java”.
$ java ArgsSum 1 2 3 4
10
Git: A Command Line Program
Git: A Command Line Tool
The git tool we’ve been using is a command line program.
jug ~/Dropbox/61b/lec/git
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
modified: HelloWorld.java
Git Source Code
Git is just a program.
Git Source Code
Git is just a program.
Git Source Code
Git is just a program.
Why are we talking about this?
Git
Git is a sophisticated piece of software. Relies on many ideas we have not yet covered:
Today, we’ll get a preview of the first three of these things, along with some insight into how git works.
Basic Git Functionality
Why Version Control?
Software development is an iterative process.
Maintaining multiple copies is useful:
Naive approach: Store a bunch of old versions in multiple directories, e.g. 2048UpFinallyWorks, 2048MergeBugFixed, etc.
The Naive Approach
The good:
Issues:
Version Control Software
There are many software packages out there that handle version control. Some popular systems:
These days, git is the most popular overall.
Today we’ll talk a little bit about how git works.
Git: How it Works
Every time you commit changes to a file, it stores a copy of the entire project in a secret folder on your computer called .git.
Let’s try this out.
V1
Hats.java
Cheese.java
“version 1 of my code”
V2
Hats.java
Cheese.java
“fixed cheese bug”
V3
Hats.java
Cheese.java
“added parmesan”
$ subl Hats.java
$ subl Cheese.java
$ git add .; git commit -m “version 1 of my code”
$ subl Cheese.java
$ git add .; git commit -m “fixed cheese bug”
$ git add .; git commit -m “added parmesan”
Git: How it Works
Every time you commit changes to a file, it stores a copy of the entire repository in a secret folder on your computer called .git.
Various tricks are employed to avoid redundancy.
V3
Hats.java
Cheese.java
“added parmesan”
V2
Hats.java
Cheese.java
“fixed cheese bug”
V1
Hats.java
Cheese.java
“version 1 of my code”
$ subl Hats.java
$ subl Cheese.java
$ git add .; git commit -m “version 1 of my code”
$ subl Cheese.java
$ git add .; git commit -m “fixed cheese bug”
$ subl Cheese.java
$ git add .; git commit -m “added parmesan”
Git log
Can view history using the git log command:
V3
Hats.java
Cheese.java
“added parmesan”
V2
Hats.java
Cheese.java
“fixed cheese bug”
V1
Hats.java
Cheese.java
“version 1 of my code”
Avoiding Redundancy
Consider the Following Scenario
Suppose a programmer makes 3 commits while working on a Java project.
V1
readme.txt
V2
game
utils
readme.txt
Game.java
Test.java
Utils.java
V3
game
utils
readme.txt
Game.java
Test.java
Utils.java
Approach 1: Store Multiple Copies of Everything
As noted before, every time you commit changes to a file, it stores a copy of the entire project in the .git folder as a new commit.
Naive approach: Each commit is stored in a subdirectory with copies of every file.
Easy to implement!
.git/v1/readme.txt
.git/v2/readme.txt
.git/v2/utils/Utils.java
.git/v2/game/Game.java
.git/v2/game/Test.java
.git/v3/readme.txt
.git/v3/utils/Utils.java
.git/v3/game/Game.java
.git/v3/game/Test.java
Approach 1: Store Multiple Copies of Everything
As noted before, every time you commit changes to a file, it stores a copy of the entire project in the .git folder as a new commit.
Naive approach: Each commit is stored in a subdirectory with copies of every file.
Naive approach is very inefficient. Here:
.git/v1/readme.txt
.git/v2/readme.txt
.git/v2/utils/Utils.java
.git/v2/game/Game.java
.git/v2/game/Test.java
.git/v3/readme.txt
.git/v3/utils/Utils.java
.git/v3/game/Game.java
.git/v3/game/Test.java
Eliminating Inefficiency
One obvious improvement: Don’t store multiple copies of the same file.
V1
readme.txt
V2
game
utils
readme.txt
Game.java
Test.java
Utils.java
V3
game
utils
readme.txt
Game.java
Test.java
Utils.java
Approach 2: Store Only Files That Change
One obvious improvement: Don’t store multiple copies of the same file.
V1
readme.txt
V2
game
utils
readme.txt
Game.java
Test.java
Utils.java
V3
game
Game.java
Approach 2: Store Only Files That Change
In revised approach 2, we only store files that change.
.git/v1/readme.txt
.git/v2/readme.txt
.git/v2/utils/Utils.java
.git/v2/game/Game.java
.git/v2/game/Test.java
.git/v3/game/Game.java
Test Your Understanding
Suppose we have the commits for versions 1 through 5 stored in the folder below. If we check out commit version 4, which files will we use?
.git/v1/Hello.java
.git/v2/Hello.java
.git/v2/Friend.java
.git/v3/Friend.java
.git/v3/Egg.java
.git/v4/Friend.java
.git/v5/Hello.java
Test Your Understanding
Suppose we have the commits for versions 1 through 5 stored in the folder below. If we check out commit version 4, which files will we use?
To figure out which files to copy, we had to walk through the entire commit history starting from commit 1.
.git/v1/Hello.java
.git/v2/Hello.java
.git/v2/Friend.java
.git/v3/Friend.java
.git/v3/Egg.java
.git/v4/Friend.java
.git/v5/Hello.java
V4: Hello.java → v2, Friend.java → v4, Egg.java → v3
Approach 3: Approach 2 but with Version Data Structure
Better approach: Rather than walking through commits from the beginning, explicitly store a list of “commits”, where each commit tells us the filename and version number for the files in that commit.
.git/v1/Hello.java
.git/v2/Hello.java
.git/v2/Friend.java
.git/v3/Friend.java
.git/v3/Egg.java
.git/v4/Friend.java
.git/v5/Hello.java
V1: Hello.java → v1
V2: Hello.java → v2, Friend.java → v2
V3: Hello.java → v2, Friend.java → v2, Egg.java → v3
V4: Hello.java → v2, Friend.java → v4, Egg.java → v3
V5: Hello.java → v5, Friend.java → v4, Egg.java → v3
Note: Each commit is a “map” or “dictionary”. For each filename, it maps that filename to a version number.
Example: V4 in Python might be represented by {“Hello.java”: 2, “Friend.java”: 4, “Egg.java”: 3}
Test Your Understanding
Suppose we have the committed files for versions 1 through 5 stored in the folders on the left, and also have the list of commits on the right. Which files do we copy if we check out version 4?
.git/v1/X.java
.git/v1/Y.java
.git/v2/Y.java
.git/v3/Z.java
.git/v4/X.java
.git/v4/A.java
.git/v5/X.java
.git/v5/Y.java
V1: X.java → v1, Y.java → v1
V2: X.java → v1, Y.java → v2
V3: X.java → v1, Y.java → v2, Z.java → v3
V4: X.java → v4, Y.java → v2, Z.java → v3, A.java → v4
V5: X.java → v5, Y.java → v5, Z.java → v3, A.java → v4
Test Your Understanding
Suppose we have the committed files for versions 1 through 5 stored in the folders on the left, and also have the list of commits on the right. Which files do we copy if we check out version 4?
.git/v1/X.java
.git/v1/Y.java
.git/v2/Y.java
.git/v3/Z.java
.git/v4/X.java
.git/v4/A.java
.git/v5/X.java
.git/v5/Y.java
V1: X.java → v1, Y.java → v1
V2: X.java → v1, Y.java → v2
V3: X.java → v1, Y.java → v2, Z.java → v3
V4: X.java → v4, Y.java → v2, Z.java → v3, A.java → v4
V5: X.java → v5, Y.java → v5, Z.java → v3, A.java → v4
Another Advantage of Approach 3
Approach 3 also allows us to avoid even more redundancy.
V1: X.java → v1, Y.java → v1
V2: X.java → v1, Y.java → v2
V3: X.java → v1, Y.java → v2, Z.java → v3
V4: X.java → v4, Y.java → v2, Z.java → v3, A.java → v4
V5: X.java → v5, Y.java → v5, Z.java → v3, A.java → v4
.git/v1/X.java
.git/v1/Y.java
.git/v2/Y.java
.git/v3/Z.java
.git/v4/X.java
.git/v4/A.java
.git/v5/X.java
.git/v5/Y.java
Another Advantage of Approach 3
Approach 3 also allows us to avoid even more redundancy.
V1: X.java → v1, Y.java → v1
V2: X.java → v1, Y.java → v2
V3: X.java → v1, Y.java → v2, Z.java → v3
V4: X.java → v4, Y.java → v2, Z.java → v3, A.java → v4
V5: X.java → v5, Y.java → v5, Z.java → v3, A.java → v4
.git/v1/X.java
.git/v1/Y.java
.git/v2/Y.java
.git/v3/Z.java
.git/v4/X.java
.git/v4/A.java
.git/v5/X.java
.git/v5/Y.java
Another Advantage of Approach 3
Approach 3 also allows us to avoid even more redundancy.
V1: X.java → v1, Y.java → v1
V2: X.java → v1, Y.java → v2
V3: X.java → v1, Y.java → v2, Z.java → v3
V4: X.java → v4, Y.java → v2, Z.java → v3, A.java → v4
V5: X.java → v1, Y.java → v5, Z.java → v3, A.java → v4
.git/v1/X.java
.git/v1/Y.java
.git/v2/Y.java
.git/v3/Z.java
.git/v4/X.java
.git/v4/A.java
.git/v5/Y.java
Rather than store v5/X.java, our commit data structure specifics that v5’s X.java is the same as v1’s.
Avoiding Redundancy with “Hashing”
Thought Experiment
Suppose we have two different programmers working on the same project.
Thought Experiment
Suppose we have two different programmers working on the same project.
Thought Experiment
Suppose we have two different programmers working on the same project.
Git is a distributed version control system. Everything is done locally, and there is no central server that stores everything.
Approach 4: Use Time and Date as the Version Number
Rather than using an escalating integer version number, we could use the current time and date.
V02_16_2021_03_29_45:
V02_16_2021_11_29_45:
V02_16_2021_13_29_45:
.git/02_16_2021_03_29_45/X.java
.git/02_16_2021_03_29_45/Y.java
.git/02_16_2021_11_29_45/Y.java
.git/02_16_2021_11_29_45/Z.java
.git/02_16_2021_13_29_45/X.java
Approach 4: Use Time and Date as the Version Number
Rather than using an escalating integer version number, we could use the current time and date. What could go wrong in this approach?
V02_16_2021_03_29_45:
V02_16_2021_11_29_45:
V02_16_2021_13_29_45:
.git/02_16_2021_03_29_45/X.java
.git/02_16_2021_03_29_45/Y.java
.git/02_16_2021_11_29_45/Y.java
.git/02_16_2021_11_29_45/Z.java
.git/02_16_2021_13_29_45/X.java
Approach 4: Use Time and Date as the Version Number
Rather than using an escalating integer version number, we could use the current time and date.
V02_16_2021_03_29_45:
V02_16_2021_11_29_45:
V02_16_2021_13_29_45:
.git/02_16_2021_03_29_45/X.java
.git/02_16_2021_03_29_45/Y.java
.git/02_16_2021_11_29_45/Y.java
.git/02_16_2021_11_29_45/Z.java
.git/02_16_2021_13_29_45/X.java
Approach 5: Use a “Hash” as the Version Number
The actual approach employed by Git is to use the “git-SHA1 hash” of a file as its version number.
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}
110011011001100110111000110010001011100100111010001010101101101010111000111100101101101101111100110111011010111011010000100001100001100000101010110001010100010
66ccdc645c9d156d5c796dbe6ed768430c1562a2
Note: The git-SHA1 hash is the SHA1 hash of (file size + a zero + the file contents).
Using the git-SHA1 Hash
Example of how git uses the git-SHA1 hash to store HelloWorld.java
Let’s try it out!�
�
Approach Comparison
Approach Number | Information to use as file version number | Downside |
1, 2, and 3 | Commit ID (that goes up by 1) that includes the file. | No central server to decide which commit is “next” if people are working offline. |
4 | Date and time of file. | Awkward to deal with simultaneous file changes. Not as elegant as SHA1-hash. |
5 | git-SHA1 hash of file. | ??? |
Approach Comparison
Can you think of something that could go wrong in approach 5?
Approach Number | Information to use as file version number | Downside |
1, 2, and 3 | Commit ID (that goes up by 1) that includes the file. | No central server to decide which commit is “next” if people are working offline. |
4 | Date and time of file. | Awkward to deal with simultaneous file changes. Not as elegant as SHA1-hash. |
5 | git-SHA1 hash of file. | ??? |
Approach Comparison
Can you think of something that could go wrong in approach 5?
Approach Number | Information to use as file version number | Downside |
1, 2, and 3 | Commit ID (that goes up by 1) that includes the file. | No central server to decide which commit is “next” if people are working offline. |
4 | Date and time of file. | Awkward to deal with simultaneous file changes. Not as elegant as SHA1-hash. |
5 | git-SHA1 hash of file. | ??? |
SHA1-Hash
Good news: The chance that two files have the same SHA hash is 1 / 2160 or roughly 1 / 1037.
In other words, git has a “bug”, but it is unlikely to ever occur in the history of the universe.
Added Benefit of SHA1-Hashing: Security
Git uses the git-SHA1 hash to verify file integrity.
Serializable and Storing Data Structures
Git Commits
Every commit in git stores (at least):
Git Commit IDs
The commit ID is the git-SHA1 hash of the commit.
Representing a Commit in Java
Suppose we have the Commit class below.
public class Commit {
public String author;
public String date;
public String commitMessage;
public String parentID;
...
}
Storing Commits
When a user of your project 2 creates a commit, you’ll need to somehow store the object below so that it can be read later.
public class Commit {
public String author;
public String date;
public String commitMessage;
public String parentID;
...
}
Storing Commits using Serializable
Java has a built-in feature called Serializable that lets you store arbitrary objects.
public class Commit implements Serializable {
public String author;
public String date;
public String commitMessage;
public String parentID;
...
}
Let’s see a quick demo.
Branching
Merging
A common feature in version control systems is the ability to create branches.
d1
“version 1 of my code”
7e
“added parmesan”
aa
“fixed cheese bug”
master
Note: An earlier version of the slide had some diagram errors.
$ git log --graph --oneline --all --decorate
* 7e41ce1 (HEAD, master) added parmesan
* aa45fbd fixed cheese bug
* d1bde19 version 1 of my code
Merging
A common feature in version control systems is the ability to create branches.
d1
“version 1 of my code”
7e
“added parmesan”
33
aa
“fixed cheese bug”
$ git checkout -b WithSwiss aa45fbd
$ subl Cheese.java
$ git add .; git commit -m “added swiss”
$ git log --graph --oneline --all --decorate
* 33c7a92 (HEAD, WithSwiss) added swiss
* | 7e41ce1 (master) added parmesan
|/
* aa45fbd fixed cheese bug
* d1bde19 version 1 of my code
“added swiss”
master
WithSwiss
Note: An earlier version of the slide had some diagram errors.
Merging
Can switch back to the master branch with checkout.
d1
“version 1 of my code”
7e
“added parmesan”
33
aa
“fixed cheese bug”
$ git checkout master
$ git log --graph --oneline --all --decorate
* 33c7a92 (WithSwiss) added swiss
* | 7e41ce1 (HEAD, master) added parmesan
|/
* aa45fbd fixed cheese bug
* d1bde19 version 1 of my code
“added swiss”
master
WithSwiss
Note: An earlier version of the slide had some diagram errors.
Merging
After switching back to master branch, can continue to make changes.
d1
“version 1 of my code”
7e
“added parmesan”
33
aa
“fixed cheese bug”
27
“fixed parm”
$ git checkout master
$ subl Cheese.java
$ git add .; git commit -m “fixed parm”
$ git log --graph --oneline --all --decorate
* 33c7a92 (WithSwiss) added swiss
* | 2720092 (HEAD, master) fixed parm
* | 7e41ce1 added parmesan
|/
* aa45fbd fixed cheese bug
* d1bde19 version 1 of my code
“added swiss”
master
WithSwiss
Note: An earlier version of the slide had some diagram errors.
Merging
Can (attempt to) merge branches.
Stuff that was in Cheese.java in the master branch, but not in the WithSwiss branch
Stuff that was in Cheese.java in the WithSwiss branch, but not in the master branch
$ git merge WithSwiss
$ Auto-merging Cheese.java
CONFLICT (content): Merge conflict in Cheese.java
Automatic merge failed; fix conflicts and commit the result.
Merging
After resolving conflict and making a new commit:
d1
“version 1 of my code”
7e
“added parmesan”
33
aa
“fixed cheese bug”
27
“fixed parm”
fa
“resolve merge”
$ git add .; git commit -m “resolve merge”
$ git log --graph --oneline --all --decorate
* faff9d1 (HEAD, master) resolve merge
|\
| * 33c7a92 (WithSwiss) added swiss
* | 2720092 fixed parm
* | 7e41ce1 added parmesan
|/
* aa45fbd fixed cheese bug
* d1bde19 version 1 of my code
“added swiss”
master
WithSwiss
Note: An earlier version of the slide had some diagram errors.
Merging
After resolving the conflict.
Writing merge will be very tough.
Note: Commits are no longer a linked list.
d1
“version 1 of my code”
7e
“added parmesan”
33
aa
“fixed cheese bug”
27
“fixed parm”
fa
“resolve merge”
“added swiss”
master
WithSwiss
Conclusion
Today we got a sneak peek into how git works under the hood.