Compression, Complexity, and P = NP
1
Lecture 39
CS61B, Spring 2023 @ UC Berkeley
Josh Hug and Justin Yokota
Announcements
My office hours today have moved to 3:30 - 4:30 (instead of 4:00 - 5:00).
I’m exploring larger room options or even doing a second lecture on Friday.
My usual goal is to structure lectures such that if you are paying close attention you can’t help but understand.
I will have office hours next week. Times TBD.
Compression Model 1 vs. Model 2 Review
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
Last Time: Compression
We saw one technique for compression called Huffman Coding.
01010101000001010101110...
Compression
Algorithm C
1001010101...
01010101000001010101110...
Decompression
Algorithm C-1
1001010101...
Bitstream B
Compressed bits C(B)
C(B)
B
Comparing Compression Algorithms
Example: What is the best way to compress mobydick.txt?
Algorithm | Uncompressed size (bits) | Compressed size (bits) |
zip | 5145656 | 2091000 |
huffman | 5145656 | 3412896 |
bzip2 | 5145656 | 1805288 |
Comparing Compression Algorithms
Example: What is the best way to compress mobydick.txt?
One problem: What if someone writes a custom compression algorithm?
Algorithm | Uncompressed size (bits) | Compressed size (bits) |
zip | 5145656 | 2091000 |
huffman | 5145656 | 3412896 |
bzip2 | 5145656 | 1805288 |
Ultra Compression
public static String greatmobydecompress(Bitstream bs) {
if (bs.toInt() == 0) {
return "CALL me Ishmael. Some years ago never mind how...";
}
if (bs.toInt() == 1) {
return "It was the best of times, it was the worst of times...";
}
return bs.dropFirstBit().toString();
}
Algorithm | Uncompressed size (bits) | Compressed size (bits) |
zip | 5145656 | 2091000 |
huffman | 5145656 | 3412896 |
bzip2 | 5145656 | 1805288 |
greatmobydecompress | 5145656 | 1 |
A Flaw in Compression Model #1
Source code for decompression algorithm itself might be highly complex.
1
0100001101000001...
mobydick.txt
Compressed Bits
Decompression
Algorithm C-1
1 bit
5145656 bits
10561262 bits
Compression Models #1 and #2
0100001101000001...
Compression
Algorithm C
1
0100001101000001...
Decompression
Algorithm C-1
1
Bitstream B
Compressed bits C(B)
C(B)
B
01110000...1
GreatMobyDecompressWithHardCodedOneBit.java
Decompression Algorithm and C(B)
Compression
Decompression Model #1
Decompression Model #2: Include the code for the decompression algorithm as part of the compressed output.
Interpreter
5,145,656 bits
0100001101000001...
mobydick.txt
Compression Model 2
The goal of a compression algorithm is to find short sequences of bits that generate desired longer sequences of bits.
Interpreter
5,145,656 bits
0100001101000001...
mobydick.txt
Great Moby Compression Algorithm
01110000...1
GreatMobyDecompressWithHardCodedOneBit.java
Decompression Algorithm and C(B)
10,561,262
+ 1 =
10,561,263 bits
Compression Model 2
The goal of a compression algorithm is to find short sequences of bits that generate desired longer sequences of bits.
Interpreter
5,145,656 bits
0100001101000001...
mobydick.txt
Huffman Compression Algorithm
01110000...1
HuffmanWithHardCoded CompressedMobyDick.java
Decompression Algorithm and C(B)
45,960 + 2,091,000 =
2,136,960 bits
Model 1 vs. Model 2 Compression
Algorithm | Uncompressed size (bits) | Compressed size (bits) | Compressed size using model 2 (bits) |
zip | 5145656 | 2,091,000 | 2,091,000 + 67,160 |
huffman | 5145656 | 3,412,896 | 3,412,896 + 45,960 |
bzip2 | 5145656 | 1,805,288 | 1,805,288 + 74,984 |
greatmobydecompress | 5145656 | 1 | 1 + 10,561,262 |
Twice as big because this algorithm also has Great Expectations hard coded into it!
Seeking Optimal Compression of the HugPlant
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
Compression Model 2
The goal of a compression algorithm is to find short sequences of bits that generate desired longer sequences of bits.
Interpreter
Huffman Compression Algorithm
01110000...01110100...
HuffmanWithHardCoded CompressedHugPlant.java
Decompression Algorithm and C(B)
45,960 + 1,994,024 =
2,039,984 bits
8,389,584 bits
0100001001001101...
HugPlant.bmp
B
Even Better Compression
Compression ratio of 25% is certainly very impressive, but we can do much better. MysteryX achieves a 0.35% compression ratio!
29,432 bits
0110100101101101...
MysteryX
Interpreter
01110000...01110100...
HuffmanWithHardCoded CompressedHugPlant.java
Interpreter
8,389,584 bits
0100001001001101...
HugPlant.bmp
2,037,424 bits
8,389,584 bits
0100001001001101...
HugPlant.bmp
Even Better Compression
Compression ratio of 25% is certainly very impressive, but we can do much better. MysteryX achieves a 0.35% compression ratio!
29,432 bits
0110100101101101...
MysteryX
Interpreter
01110000...01110100...
HuffmanWithHardCoded CompressedHugPlant.java
Interpreter
8,389,584 bits
0100001001001101...
HugPlant.bmp
2,037,424 bits
8,389,584 bits
0100001001001101...
HugPlant.bmp
Question: What is mystery X?
MysteryX: HugPlant.java
MysteryX is just HugPlant.java, the piece of code that I used to generate the .bmp file originally.
29,432 bits
0110100101101101...
CB: HugPlant.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
69 6d 70 6f 72 74 20 6a 61 76 61 2e 61 77 74 2e
43 6f 6c 6f 72 3b 0a 0a 0a 70 75 62 6c 69 63 20
63 6c 61 73 73 20 48 75 67 50 6c 61 6e 74 20 7b
0a 09 2f 2f 73 65 6e 64 20 65 6d 61 69 6c 20 74
6f 20 70 72 69 7a 65 40 6a 6f 73 68 68 2e 75 67
20 74 6f 20 72 65 63 65 69 76 65 20 79 6f 75 72
20 70 72 69 7a 65 0a 0a 09 70 72 69 76 61 74 65
20 73 74 61 74 69 63 20 64 6f 75 62 6c 65 20 73
63 61 6c 65 46 61 63 74 6f 72 3d 32 30 2e 30 3b
0a 0a 09 70 72 69 76 61 74 65 20 73 74 61 74 69
63 20 69 6e 74 20 67 65 6e 43 6f 6c 6f 72 56 61
6c 75 65 28 69 6e 74 20 6f 6c 64 56 61 6c 2c
Question #1: Comprehensible Compression
Interesting question #1:
29,432 bits
0110100101101101...
CB: HugPlant.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
“Comprehensible” Compression
Question #2: Optimal Compression
Interesting question #2:
Seems plausible that this optimal program would also have structure.
??? bits
0110100101101101...
CB: OptimalHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Optimal Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
Kolmogorov Complexity
Given a target bitstream B, what is the shortest bitstream CB that outputs B.
??? bits
0110100101101101...
CB: OptimalHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Optimal Compression
Kolmogorov Complexity
Given a target bitstream B, what is the shortest bitstream CB that outputs B.
Fact #1: Kolmogorov Complexity is effectively independent of language.
Java Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
??? bits
0010000001110000...
CB: OptimalHP.java
Kolmogorov Complexity (Language Independence)
Fact #1: Kolmogorov Complexity is effectively independent of language.
Paul Hilfinger writes a Python program that is very short and uses weird Python features and I am stumped and cannot think of a similar thing in Java.
Java Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
0010000001110000...
CB: OptimalPythonHPAnd
PythonInterpreter.java
Kolmogorov Complexity (Language Independence)
Fact #1: Kolmogorov Complexity is effectively independent of language.
This is a deep fact!
Java Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
0010000001110000...
CB: OptimalPythonHPAnd
PythonInterpreter.java
Kolmogorov Complexity (Uncomputability)
Fact #2: It is impossible to write a program that even calculates the Kolmogorov Complexity of any bitstream. Proof available here.
??? bits
0110100101101101...
CB: OptimalHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Optimal Compression
Optimal compression algorithm that works for all inputs does not exist! Leads to a logical fallacy.
Space/Time Bounded Compression
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
Question #2: Optimal Compression
Interesting question #2:
Unfortunately the answer is no. This is not possible, even theoretically.
??? bits
0110100101101101...
CB: OptimalHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Optimal Compression
Question #2S: Space Bounded Compression
Interesting question #2S: Can we create a compression algorithm C(B, S) that:
Let’s call this “space bounded compression”.
≤ S bits
0110100101101101...
CB: SpaceBoundedHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Space Bounded
Compression
S
Question #2S: Space Bounded Compression
Interesting question #2S: Can we create a compression algorithm C(B, S) that:
No! Could be used to find KJ(B).
≤ S bits
0110100101101101...
CB: SpaceBoundedHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Space Bounded
Compression
S
Question #2ST: Space/Time Bounded Compression
Interesting question #2ST: Can we create a space/time bounded compression algorithm C(B, S, T) that:
≤ S bits
0110100101101101...
CB: SpaceBoundedHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Space/Time Bounded
Compression
S
T
executes ≤ T lines of bytecode
Question #2ST: Space/Time Bounded Compression
Interesting question #2ST: Can we create a space/time bounded compression algorithm C(B, S, T)? Yes! And here’s an algorithm:
Runtime: O(T x 2S)
≤ S bits
0110100101101101...
CB: SpaceBoundedHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Space/Time Bounded
Compression
S
T
executes ≤ T lines of bytecode
Treating B as a constant.
Question #2ST-E: Space/Time Bounded Compression
Interesting question #2ST-E: Can we create an efficient space/time bounded compression algorithm?
≤ S bits
0110100101101101...
CB: SpaceBoundedHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Efficient Space/Time Bounded
Compression
S
T
executes ≤ T lines of bytecode
P = NP?
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
Surprising Fact
An efficient solution to any of these three problems:
Would also give an efficient space/time bounded compression algorithm.
Why?
*: And also tens of thousands of other related problems.
3SAT and Space/Time Bounded Compression
Example:
Visual Reduction
LONGEST_PATH can be used to solve Bounded Space/Time Compression.
B: |
S: 20,000 |
T: 10,000,000 |
Bounded Space/Time Compression
Preprocess
LONGEST_PATH
G
path of weight > W
Postprocess
Java code of length < 20,000 bits that outputs HugPlant in 10,000,000 executed bytecode lines.
3SAT and Space/Time Bounded Compression
Example:
How do we know X can be turned into a longest paths problem?
P = NP?
Two important classes of yes/no problems:
Examples of problems in P:
Examples of problems in NP:
*: Technically it’s problems for a which a “yes” answer is efficiently verifiable.
P = NP?
Two important classes of yes/no problems:
Examples of problems not in NP:
*: Technically it’s problems for a which a “yes” answer is efficiently verifiable.
Totally Shocking Fact
Every single NP problem reduces to 3SAT.
In other words, any decision problem for which a yes answer can be efficiently verified can be transformed into a 3SAT problem.
This result is by Cook (1971) and Levin (1973). See Cook-Levin Theorem for more.
Open Question in Computer Science
Question posed Stephen Cook in 1971: Are all NP problems also P problems?
One reason to think yes:
See CS170 for a much more thorough and formal treatment of this problem.
“Almost Like Gods”
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
P = NP?
Consensus Opinion (Bill Gasarch Poll, 2012 poll)
Why is opinion generally negative?
What is NP-complete?
NP Complete Problems
It turns out that there are tons of NP problems that all reduce to each other.
Independent Set
Hamiltonian Cycle
Decision version of TSP
Hamiltonian Path
3-SAT
Two-Prime Integer Factorization
Purple: NP-Complete
White: NP, not known to be NP complete.
v → w means
a solver for v
can solve w.
Fun Fact: Mathematical Proofs Are in NP!
Example of NP problem: Is there a proof that the Riemann Hypothesis is true?
If P=NP, then mathematical proof can be automated!
“[A linear or quadratic-time procedure for what we now call NP complete problems would have] consequences of the greatest magnitude. [For such a procedure] would clearly indicate that, despite the unsolvability of the Entscheidungsproblem, the mental effort of the mathematician in the case of yes-or-no questions could be completely replaced by machines.” - Kurt Gödel
One of These Things, Is Not Like The Others
In 2000, the Clay Mathematics Institute set up $1,000,000 prizes for the solution of each of seven problems.
Millenium Prize Problems.
Question #2ST-E: Space and Time Bounded Compression
Back to our earlier quest:
≤ S bits
0110100101101101...
CB: SpaceBoundedHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
Efficient Space/Time Bounded
Compression
S
T
executes ≤ T lines of bytecode
Even More Impressive Consequences
Even More Impressive Consequences
“I have heard it said, with a straight face, that a proof of P = NP would
be important because it would let airlines schedule their flights better, or
shipping companies pack more boxes in their trucks!
If [P = NP], then we could quickly find the smallest Boolean circuits that output (say) a table of historical stock market data, or the human genome.... It seems entirely conceivable that, by analyzing these circuits, we could make an easy fortune on Wall Street, or retrace evolution... For broadly speaking, that which we can compress we can understand, and that which we can understand we can predict.
So if we could solve the general case—if knowing something was tantamount to knowing the shortest efficient description of it—then we would be almost like gods. [Assuming P ≠ NP] is the belief that such power will be forever beyond our reach.”
- Scott Aaronson http://www.scottaaronson.com/papers/npcomplete.pdf
Does Short = Comprehensible?
Lecture 39, CS61B, Spring 2023
Compression
Optimal Compression and Kolmogorov Complexity (172 preview)
Space/Time Bounded Compression
P = NP?
“Almost Like Gods”
Does Short = Comprehensible?
Back to Question #1: Comprehensible Compression
Earlier, we’d hoped for “comprehensible compression”.
Scott Aaronson earlier made an implicit conjecture that a short program will also be comprehensible.
29,432 bits
0110100101101101...
CB: ReadableHP.java
Interpreter
8,389,584 bits
0100001001001101...
B: HugPlant.bmp
“Comprehensible” Compression
Comprehensible Compression of Other Data
Scott also conjectures that if we fed in useful data about the stock market, we’d effectively get back a useful economy simulator.
0110100101101101...
CB: EconomySimulator.java
Interpreter
01001010010100...
B: marketData.csv
“Comprehensible” Compression
All business reports by every business ever along with stock market data.
Some sort of relatively short Java program that acts as a simulator for the world economy that we can read through and understand.
Complexity from Simple Rules
However, there are also reasons to suspect that simplicity might not indicate comprehensibility.
Earlier, we said that compressibility was a rare thing.
Fractals
In the 1970s, Mandelbrot built demonstrations that very short programs could generate highly complex visual patterns.
The Mandelbrot Set
Bounded Space/Time Compression
S
T
???
Short (But Simple)?
...
public class Mandelbrot extends JFrame {
...
for (int y = 0; y < getHeight(); y++) {
for (int x = 0; x < getWidth(); x++) {
zx = zy = 0;
cX = (x - 400) / ZOOM;
cY = (y - 300) / ZOOM;
int iter = MAX_ITER;
while (zx * zx + zy * zy < 4 && iter > 0) {
tmp = zx * zx - zy * zy + cX;
zy = 2.0 * zx * zy + cY;
zx = tmp;
iter--;
}
I.setRGB(x, y, iter | (iter << 8));
}
}
...
}
Hard to say what we learn by looking at this code.
Bounded Space/Time Compression
S
T
Music from Simple Rules
This notion of complexity from simple rules is arguably more interesting (and alarming) when applied towards sound generation.
Music from very short programs (3rd iteration): Youtube