1 of 47

Topic 2:�Histograms, Sets, and Maps

Competitive Programming I (Fall 2020)

Ninghui Li

2 of 47

Example Problems

Homework Problems

  • Anagram Counting
  • Exam Seating
  • Affine Names (Similar to USACO Cities and States)

3 of 47

Additional Problems

4 of 47

Kattis: Baloni

5 of 47

Kattis: Baloni

There are N (1<=N<=10^6) balloons floating, lined up from left to right, each at height H_i, where 1<=H_i<=10^6. One wants to pop all of them by shooting arrows at chosen heights. Each arrow travels from left to right. When an arrow touches a balloon, the balloon pops and disappears and the arrow continues its way at a height decreased by 1. Output the minimal number of arrows that is needed.

Input: Output:

5 3

4 5 2 1 4

6 of 47

Baloni: Illustration

5

4

3

2

1

Arrow 1

Arrow 2

Arrow 3

7 of 47

Baloni: Analysis

One brute-force approach is to simulate effect of each arrow.

  • Scanning from left to right, a new balloon requires a new arrow, follow the trace of this arrow, which may destroy other balloons. Repeat while there are balloons left. What is the complexity of this approach?
  • Simulating one arrow takes O(N) time, and we may need O(N) arrows, for a complexity of O(N^2)

How to avoid scanning through all balloons many times?

Simulate multiple arrows at the same time! How to do that efficiently?

Use a histogram to maintain the state of multiple arrows (mapping height to number of arrows travelling at that height). For each new baloon, either it can be popped, or a new arrow is needed.

8 of 47

Java Code for Baloni

public static void main(String[] args) throws IOException {

Scanner in = new Scanner(System.in);

int N = in.nextInt(), ans = 0;

int[] h = new int[N], a = new int[1000001];

for (int i=0; i<N; i++) { h[i] = in.nextInt(); }

for (int i=0; i<N; i++) {

int hb = h[i];

if (a[hb] == 0) { ans++; a[hb-1]++; }

else { a[hb]--; a[hb-1]++; }

}

System.out.println(ans);

}

9 of 47

Set and Map

10 of 47

Set

  • A set is a collection containing no duplicate elements
  • A set supports the following operations efficiently:
    • Testing for membership, Adding elements, Removing elements
  • An ordered set can order the elements in it by some order, and thus additionally support efficient
    • first, last, ordered traversal, etc.
  • Set operations:
    • Union, Intersection, Difference, Subset

11 of 47

Methods in The Java Set Interface

boolean contains (E e) // member test

boolean add (E e) // adding, enforces no-dupicates

boolean remove (Object o)

boolean isEmpty ()

int size ()

Iterator<E> iterator ()

boolean containsAll (Collection<E> c) // subset test

boolean addAll (Collection<E> c) // set union

boolean removeAll (Collection<E> c) // set difference

boolean retainAll (Collection<E> c) // set intersection

12 of 47

Java Classes Implementing Set

HashSet

  • Implemented using hashing
  • Methods add/remove/contains/size takes O(1) time

TreeSet

  • Implemented using a balanced binary search tree
  • Implement SortedSet
    • first() : Returns the first (lowest) element
    • last() : Returns the last (highest) element
    • headSet/tailSet/subSet (returns a portion of the set)
  • Methods add/remove/contains/first/last takes O(log N) time

13 of 47

C++ STL Classes for set

  • unordered_set (hash-based)
    • Methods such insert, erase, find, count, … take O(1) time

  • set (ordered)
    • Methods such as insert, erase, find, count, lower_bound, upper_bound take O(log N) time

14 of 47

Map

  • Map is a set of ordered pairs, each (key, value)
  • In a given Map, there are no duplicate keys
  • Each key is mapped to one value
  • Different keys can be mapped to the same value
  • Can be used to implement multi-set, i.e., a collection of things where one thing can appear multiple times.

  • An array can be viewed as a Map from a set of integer indices to a set of values

15 of 47

Some Methods of java.util.Map<K, V>

// K is the key type

// V is the value type

V get (Object key) // may return null

V put (K key, V value) // returns previous value or null

V remove (Object key) // returns previous value or null

boolean isEmpty ()

int size ()

containsKey(Object key)

Set<K> keySet()

Set<Map.Entry<K,V>> entrySet()

16 of 47

Using Map:

  • A helpful method: p.merge(k, v, Function) does the following

e = p.get(k);

if (e==null) {

p.put(k,v);

return v;

} else {

e=Function(e,v);

p.put(k,e);

return e;

}

17 of 47

Java Classes Implementing Map

  • HashTable (HashMap supporting multi-threaded access)
    • Not needed in competitive programming
  • HashMap
    • (Amortized) Constant, i.e., O(1), time for get/put/remove
  • TreeMap
    • Implemented SortedMap
    • Have firstKey(), lastKey(), ...
    • O(log N) time for get/put/remove

18 of 47

C++ STL classes for Map

  • unordered_map

  • map

  • We will show usages of them in sample code

19 of 47

Using Map for Histograms

  • We want to figure out for each value, how many time it has been added. This is a histogram.
  • When the element is integer of a limited range, we can use an array.
  • When the element can be very large, using Array takes too much space.
  • We can use HashMap or TreeMap.
    • Important methods: put(k,v), get(k), size()
    • For HashMap, these operations take constant time on average.
    • For TreeMap, put/get takes time O(log N), where N is the number of keys in the map, which is close to constant .

20 of 47

21 of 47

Given an array of integers and an integer k, find the total number of continuous subarrays whose sum equals to k.

The length of the array in range [1, 20,000]. Range of numbers in the array is [-1000, 1000] and the range of k is [-1e7, 1e7].

Input: nums = [1, -1, 4, 2, -2, 2, 1], k = 6 Output: ?

Answer: 5 [1, -1, 4, 2] [4, 2] [4, 2, -2, 2] [1, -1, 4, 2, -2, 2]

[-1, 4, 2, -2, 2, 1]

22 of 47

Subarray Sum Equals K: Take One (Bruteforce)

1 public int subarraySum(int[] nums, int k) {

2 int count=0, n=nums.length;

3 for (int i=0; i<n; i++) for (int j=i+1; j<=n; j++) {

4 int subsum = 0;

5 for (int p=i; p<j; p++) subsum += nums[p];

6 if (subsum == k) count++;

7 }

8 return count;

9 }

What is the time complexity of this approach?

A: O(N) B: O(N^2) C: O(N^3)

D: O(N^2 K) E: O(N K^2)

What computation are repeated?

How to avoid them?

Main computation is lines 5, which is executed O(N^2) times (about N^2/2 times). Lines 5 is another loop that executes O(N) times (perhaps around N/3 on average). Overall complexity is thus O(N^3).

23 of 47

Analysis: LeetCode 560 Subarray Sum Equal K

  • The Naive algorithm is to check all subarrays A[i..j], compute their sums, and check whether it works.
  • There are O(N^2) pairs to check, and O(N) to compute sum, so total complexity is O(N^3).
  • Observation: The sum of a subarray A[range(i,j)] (all elements from A[i] to A[j-1]), equals the difference of prefix sums: S_j-S_i.
  • If we store all prefix sums in an array, computing subarray sums takes constant time.

24 of 47

Subarray Sum Equal K: Take Two (Prefix-sum array)

  1. public int subarraySum(int[] nums, int k) {
  2. int count=0, n=nums.length;
  3. int[] psums = new int[n+1]; // prefix sum array
  4. for (int i=0; i<n; i++) psums[i+1]=psums[i]+nums[i];
  5. for (int i=0; i<n; i++) for (int j=i+1; j<=n; j++) {
  6. if (/* what goes in here? */) count++;
  7. }
  8. return count;
  9. }

What is the time complexity of this approach?

A: O(N) B: O(N^2) C: O(N K) D: O(N^2 K)

Line 6: (psums[j]-psums[i]==k)

Main computation is on line 6, which is executed O(N^2) times, and takes constant time. Overall complexity is thus O(N^2).

Can we speed it up further?

25 of 47

Subarray Sum Equal K: Analysis

  • Take-two checks sums of each of O(N^2) subarrays. Can we avoid checking each one. Let us look at an example (k=6):

num

1

-1

4

2

-2

2

1

psums

0

1

0

4

6

4

6

7

# of subarrays with sum=k ending at this index

?

?

?

?

?

?

?

?

0 0 0 0 2 0 2 1

Approach in take 2 looks through all psums[i] and counting how many of them are psums[j]-k. Can we do that faster?

26 of 47

Subarray Sum Equal K: Take Three: Prefix Histograms

public int subarraySum(int[] nums, int k) {

int count=0, rsum=0, MAXS=2000000;

int[] hist=new int[2*MAXS+1];

hist[MAXS] = 1; // hist[x] record number of i’s psums[i]=x-MAXS;

for (int x : nums) {

rsum += x;

count += hist[rsum + MAXS - k];

hist[rsum + MAXS] ++;

}

return count;

// should this code pass?

}

This code should not pass because array length is up to 20000, and each element is in [-1000,1000]. Thus MAXS should be 2*10^7 instead of 2*10^6.

Using MAXS=2*10^7 fails in leetcode. Using MAXS=2*10^6 passes.

We should not use array.

27 of 47

Take Four: Using HashMap for Histogram

Debug Time: Find the Bugs in the Code.

  1. public int subarraySum(int[] nums, int k) {
  2. HashMap<Integer,Integer> s=new HashMap<Integer,Integer>();
  3. int sum = 0, count = 0;
  4. for (int x : nums) {
  5. sum += x;
  6. count += s.get(k-sum);
  7. s.put(sum, 1);
  8. }
  9. return count;
  10. }

Bugs:

Before line 4, should add s.put(0,1);

Line 6 should be something like

Integer x = s.get(sum-k);

If (x != null) count += x;

Line 7 should be

s.merge(sum, 1, Integer::sum);

Time complexity is linear, because line 5-7 are executed N times, and the map operations take constant time.

28 of 47

Lesson from LeetCode 560

  • Any subarray sum can be computed as difference of two prefix sums
  • The information of how many time a prefix sum has appeared can be maintained as a histogram
  • How to implement a histogram depends on K
    • When K is small, uses an array, where R[i] is the number of prefix sums with remaining mod K equalling i
    • When K is large, we use a Map.
  • Thinking about counting how many solutions ending at an index is an important generic technique.

29 of 47

30 of 47

USACO: Cities and States

Given N cities, each has a city name and a two-letter state code. We say that city A and city B are related if the first two letters of A’s name is the same as B’s state code and vice versa, and A and B are from different states. For example:

FLINT, MI MIAMI, FL

Output the number of pairs of cities that are related.

1 ≤ N ≤ 200,000

31 of 47

Cities and States Analysis

  • Take one: Consider each pair of cities, check whether they are related.
  • What is the time complexity?
    • O(N^2), not efficient enough.
  • We have to avoid checking every pairs.
  • For each city, can we quickly find which cities are related to it?

32 of 47

Cities and States Analysis (Continued)

  • Given “FLINT, MI” what features must a city name have to be related to it?
    • It must be “MI..., FL”. Can we quickly find these cities?
    • We can use an index, using “MIFL”, using a Map.
    • Thus “FLINT, MI” will be put under key “FLMI”, and any name under the key “MIFL” will be related to it.
  • How to deal with the case that the cities must be from different states?
    • Do not consider cities like “XY… XY”

33 of 47

A Key Idea

  • Each city has two keys, it is put into the map under one key, and to look for the other city, look using the other key

34 of 47

Solution Code in Java

public static void main(String[] args) throws IOException {

// initializing in and out

int N = in.nextInt(), ans = 0;

Map<String, Integer> ind = new HashMap();

for (int i = 0; i < N; i++) {

String ct = in.next().substring(0,2), st = in.next();

if (ct.equals(st)) continue; // skip XY… XY

String key1 = ct + st, key2 = st + ct;

ans += ind.getOrDefault(key1, 0); // looking for matches

ind.merge(key2, 1, Integer::sum); // adding it

}

out.println(ans);

out.close();

}

35 of 47

Solution Code in C++11

int n, ans=0;

cin >> n;

map<string, int> d;

for (int i=0; i<n; i++) {

string cname, st, cn;

cin >> cname >> st;

cn = cname.substr(0,2);

if (cn != st) ans += d[cn+st];

d[st+cn]++;

}

cout << ans;

36 of 47

LeetCode: Maximum Frequency Stack

(Optional)

37 of 47

What is a Stack?

  • A Stack supports the following two operations
    • push(x) stores the value x
    • pop() removes and returns the value that is pushed most recently.
  • For example, if we have
    • push(2) push(3) push(2) push(1) pop() returns 1 pop() returns 2 push(3) pop() returns 3 pop() returns ?

38 of 47

Maximum Frequency Stack

  • Implement a data structure that supports the following two operations
  • push(int x) Add x to the data
  • pop()
    • returns and removes the most recent addition of the value that has been added the most number of times. If there is a tie, remove the value that is added most recently
  • For example, after push(5), push(7), push(5), push(7), push(4), push(5).
  • The sequence of values that we can get from pop()’s is

5

7

5

7

4

5

5, 7, 5, 4, 7, 5

39 of 47

Maximum Frequency Stack (Take One: Inefficient)

Use an dynamically sized array, e.g., ArrayList in Java and vector in C++

  • For each push, add to the end
    • Efficient!
  • For each pop
    • Scan through the array, find the last appearance of an element that appears the most number of times, remove and return that
      • Takes time linear in the array size

40 of 47

Maximum Frequency Stack (Take Two)

  • For pop(), how to quickly find the element that appeared the most?
  • We can use an ordered set to store information about each element, e.g., an ordered set of
    • [number_of_occurrences, element]
    • We can select the last element in the ordered set
  • How to support tie-breaking?
    • Use a counter for push (called that time)
    • [number_of_occurences, time of most recent addition, element]
    • Selecting the last element gives the right element.

41 of 47

Maximum Frequency Stack (Take Two, continued)

  • How to update the ordered set after pop()?
    • We need to find the next most recent addition time of that element.
    • That means we need to store all addition times of that element in another data structure.
  • We need two data structures:
    • An ordered set of triples (count, last addition time, value)
    • For each element, a list of added times (what data structure to use?)
  • After pop() of one element, find the next to last addition of the value, add a new triple to the ordered set.

42 of 47

Maximum Frequency Stack (Take Two, continued)

  • How to update the two data structures when a push(int x) happens?
    • Ordered set of triples
    • Map of value to list of addition times
  • Updating the map is easy.
    • Find the list, append the time
  • Update the set is a bit more complicated
    • From the map, we have all information needed to determine the element in the set corresponding to x.
    • We can remove the element, then add a new one.

43 of 47

Java Code

class FreqStack {

// set of [Frequency, Last Index, Value]

private TreeSet<int[]> data;

// Maps v to list of indices

private HashMap<Integer, ArrayList<Integer>> ind;

private int count = 0; // Used to order push calls

public FreqStack() {

data = new TreeSet<int[]>(comp);

ind = new HashMap<Integer, ArrayList<Integer>>();

}

44 of 47

Java Code for the Comparator

static Comparator<int[]> comp = new Comparator<int[]>() {

public int compare(int[] a, int[] b) {

for (int i=0; i<a.length; i++) {

if (a[i] < b[i]) return -1;

else if (a[i] > b[i]) return 1;

}

return 0;

}

};

45 of 47

The pop() function

public int pop() {

int[] x = data.pollLast();

// x[0]==Frequency, x[1]==Last push, x[2] = value

int v = x[2];

ArrayList<Integer> oc = ind.get(v);

oc.remove(occs.size() - 1);

int sz = occs.size();

if (sz > 0)

data.add(new int[] {sz, oc.get(sz-1), v});

return v;

}

46 of 47

The push() function

public void push(int x) {

ArrayList<Integer> oc = ind.get(x);

if (oc == null) { // First time seeing the value

oc = new ArrayList<Integer>();

ind.put(x, oc);

}

int sz = oc.size();

if (sz > 0) { // Currently non-zero occurrences

data.remove(new int[]{sz, oc.get(sz-1), x});

}

oc.add(count);

data.add(new int[]{sz+1, count, x});

count++;

}

47 of 47

Maximum Frequency Stack (Take three)

Using multiple stacks, one for each frequency

For example, after push(5a), push(7a), push(5b), push(7b), push(4a), push(5c), pop(), pop(), push (4b).

3

5c

2

5b

7b, 4b

1

5a

7a

4a