Playing around with a data compression algorithm I came upon a problem that I was pretty sure had been studied before. Turns out it was the set cover problem.

It is conceptually very simple. Stated informally: imagine we have a set of light switches S and a set of lights L. Each switch in S turns on an arbitrary subset of the lights in L. The problem is to turn on all of the lights L using the smallest number of switches in S.

Here is an example problem representation:

{1 #{1 4} 2 #{2 3}} |

The map can be interpreted as “switch 1 turns on lights 1 and 4, switch 2 turns on lights 2 and 3”. Assuming there are only the four lights, the only solution would be the set of switches #{1 2} since that is the smallest set that turns on all four lights.

Here is a larger problem which illustrates a key principle:

{1 #{1 2 3} 2 #{4 5 6} 3 #{2 3 4 5} 4 #{1} 5 #{6}} |

If one tries to solve this problem, it is intuitive to start with switch 3, since it turns on the most lights (4 out of the 6). However, once switch 3 is on, there is no way to not choose two more switches for a solution of size 3. The optimal solution is to chose switches 1 and 2 for a size of 2.

It turns out that the set cover problem is NP-complete, which for our purposes means that finding the optimal solution is at least exponential in the size of the problem.

However, using the heuristic described above of iteratively choosing the switch that turns on the most remaining lights is not all bad. This is called a “greedy” algorithm since it always takes the biggest piece of the pie, even if that forces it into suboptimal solutions. In exchange for a risk of suboptimality, the greedy algorithm runs subexponentially.

Below is a visualization comparing the solution quality of both approaches. Each circle represents a light, and each color represents a single switch. The lights are colored by the switches that activate them in the found solution. In each cluster of lights, the top row represents the greedy solution and the bottom the optimal solution:

We’ve already explained how we represent a problem instance as a map of switches to the sets of lights they activate.

To create the above graphic, we solve in two ways: greedy or brute.

For greedy, we iteratively find the switch that activates the most off-lights. If we keep a set of current off-lights the size of the intersection of each switch’s lights with the current off-lights is what we are after. We can select the largest using max-key.

(apply max-key (fn [[k v]] (count (clojure.set/intersection l-off ;set of lights currently still off v))) ;set of lights switch k turns on S) ;S is the switches=>lights map |

We iterate the procedure until no more lights are off. Keeping track of which switches were used gives us our solution.

For brute, we simply generate all combinations of switches, filter for the solutions, and take the smallest size combination. This is not feasible for larger problems since the total number of combinations grows exponentially.

The following code generates all combinations of switches and filters (retains) the solutions:

(filter (fn [c] (= lights ;set of all lights (apply clojure.set/union (vals (select-keys S c))))) (for [n (range 1 (count S)) c (combos n switches))] (set c))) |

Happy set covering!

Refs: