2 of 37

Motivation

Answering repeated range queries, possibly with updates

Given an array, find minimum number in range multiple times.
Given an array, find the sum of numbers in a range multiple times.
What if we want to support point updates? Change a[i] to 69 or something?
What if we want to support range updates? Add 2 from a[2], a[3], to a[420]?

Naive solution

o(q) queries with max o(n) elements each -> o(nq) operations!
a[1] + … + a[7], a[3] + … a[8]
There’s repeated computation here (a[3] + … + a[7]), can we find a better way?

We want some sort of range query data structure, which supports these operations as fast as possible

Range query
Point update, range update

3 of 37

In this talk:

Some range query data structures that you won’t see in CS2040S

Prefix sum (Most common to appear in LeetCodes)
Segment trees (no Fenwick trees, I hate them)
Sparse table (very unlikely to see, but fun idea)
Square root decomposition (very unlikely to see, but fun idea)

Too lazy to find good implementations, but will give you the idea on how to implement.
Some problems solved using these data structures

Directly, or through some transformation that allows us to solve it.

Roughly CS2040S++ difficulty?
For the “I am from NUSH CS/RI/HCIRS with NOI Gold” chads, hopefully my math-y formalisation of some stuff will be kinda cool.

4 of 37

Prefix sum (apparently covered by Shawn)

Quite a common technique for Leetcodes, for answering repeated range queries
Basic idea is this:

For an array A, compute an auxiliary array B where B[i] = A[0] + A[1] + … + A[i] in O(n) time.
If we are asked to calculate A[15] + A[16] + … + A[25], observe that we can take B[25] - B[14], which gives us ( A[0] + … + A[25] ) - ( A[0] + … + A[14] ) = A[15] + … A[25]!

O(n) precomputation of B, which allows us to do O(1) range queries!
Quick question: overkill for only 1 range query?

5 of 37

Variations on a theme

What other operations are possible?

XOR?
Matrix addition?
Multiplication? What if I have zeros?

A bit more complex.

General matrix multiplication? Matrix multiplication w/ non-zero det?
Minimum/maximum?

Key idea is this: “inverse” exists for all elements

Technically you need associativity too, but all data structures today do.
If you are a math lover (🤓🤓🤓), you’ll know that this is a group structure

6 of 37

Prefix sum problems

https://cses.fi/problemset/task/1646

Classical prefix sum

https://cses.fi/problemset/task/1650

Prefix sum, but with different operation (XOR)

7 of 37

Reference implementation

https://cses.fi/problemset/task/1646

# function to preprocess the input array and compute prefix sums

def prefix_sums(arr):

n = len(arr)

prefix = [0] * (n + 1)

for i in range(1, n + 1):

prefix[i] = prefix[i - 1] + arr[i - 1]

return prefix

# function to handle queries using the prefix sum array

def range_sum_query(prefix, a, b):

return prefix[b] - prefix[a - 1]

# read input values

n, q = map(int, input().split()) # number of elements and queries

arr = list(map(int, input().split())) # array of elements

# compute prefix sums

prefix = prefix_sums(arr)

# process each query and print the sum of values in the given range

for _ in range(q):

a, b = map(int, input().split())

print(range_sum_query(prefix, a, b))

8 of 37

Recap:

We want some sort of range query data structure, which supports these operations as fast as possible

Range query
Point update, range update

We have a data structure that gives us range queries for certain types of operations (ones with inverse).
What about updates? Updating is still O(n) for each update, assume O(q) updates, O(nq) overall!
Can we support more types of operations?

What constraints can we relax? Can we do away with requirement for inverse?

9 of 37

Segment trees

Good-ish time complexity for range queries Yes, O(log n) vs O(1))
Better time complexity for updates? Yes, O(log n) with segment trees.
More types of operations? Yes, only associativity with segment trees.

Minimum is now possible!

So what actually is a segment tree?

Key idea: break computation down into blocks we can reuse.

Let’s see how we can derive one.

10 of 37

Let’s think of how to break up the calculation

Use inclusive left, exclusive right semantics

Makes calculation easier! Minimises off-by-one index errors from calculating mid-point.

Given already computed sums of

[0,16) (length 16 blocks)
[0,8), [8,16) (length 8 blocks)
[0,4), [4,8), [8,12), [12,16) (length 4 blocks)
[0,2), [2,4), [4,6), [6,8), [8,10), [10,12), [12,14), [14,16) (length 2 blocks)
[0,1), [1,2), [2,3), [3,4), [4,5), [5,6), [6,7), [7,8), [8,9), [9,10), [10,11), [11,12), [12,13), [13,14), [14,15), [15,16) (length 1 block)

Suppose I want [3,12), can you find a way to minimise the amount of blocks you use to compute a range?

You could simply do [3,4) + [4,5) + … + [11,12), but then there’s no point in maintaining such a complicated structure. Just naively add.

11 of 37

TL:DR; Please draw a pretty picture

12 of 37

Let’s phrase it algorithmically

Node is a struct that holds 5 values

L, left end of range inclusive
R, right end of range exclusive
Sum of values within the range [L, R)
Pointer to left child node [L, M)
Pointer to right child node [M, R)

sum(i, j, Node*) is a function that given a range [i, j), returns you the sum of all the points in the Node’s range which intersect with it.
If I have node N representing range [0,7) with sum a[0], … , a[7], sum(2,10, N) should return me the sum of [0,7) intersect [2,10), or [2, 7).
Calling sum(i ,j, [0, 16)) should give me the answer to the range query!

13 of 37

Let’s phrase it algorithmically

sum has recursive structure!

sum(i, j, Node* n) = sum(i, j, leftChild) + sum(i, j, rightChild)

If recursive: need to think about base cases.

If node is fully contained within [i,j), just return the sum
If node is fully disjoint from [i,j) return 0.
Otherwise, node has intersection with the range requested, and need to recurse to children.

14 of 37

Back to the drawing board.

sum(i, j, Node* n) = sum(i, j, leftChild) + sum(i, j, rightChild)
If node is fully contained within [i,j), just return the sum
If node is fully disjoint from [i,j) return 0.

15 of 37

Claim: range queries in O(log n), NOT O(n)!

Recursion looks like it doubles by two each level, maybe O(n) overall?

However, one child is guaranteed to hit base case and return immediately.

At each level, it only does at most 4 operations, O(log n) levels, 4 log n operations total.
Informally, proof by 👀 observation, formally, proof by 🤓 induction.
For no insight into how the algo works at all: proof by 🤢 contradiction

Suppose >= 5 operations at a level
This means at previous level, at least 3 operations had to break into children.
This means requested range intersected partially with 3 ranges.
But ranges are distinct and ordered, if it intersects with two ranges, and intersects with one more, it implies that the third range is wholly contained within the range.
Contradiction!

16 of 37

Wait, how do we calculate the blocks in the first place?

Naively computing every single block -> O(NlogN)
Reusing previously calculated results -> O(N)

17 of 37

Claim: updates are in log(n) now too!

How can we update values?
Since now our computation is broken up into blocks, we only have to update the blocks!
There are only log(n) blocks which contain the relevant sums (imagine “drawing” up a line).
Furthermore, updates can be efficiently propagated.

Updating a block of n length can be done in O(1) time (update the relevant child block, and then combine the blocks).

18 of 37

Claim: updates are in log(n) now too!

19 of 37

Reference implementation

This slide is left as an exercise to the reader.
Backing data structure can be array or node-based tree.

Arrays are faster but have to allocate all nodes.
Node-based trees are slower but can be allocated only when needed.

Suppose range of [0,1e9), array-based tree is impossible, but node-based tree possible!

20 of 37

Recap

We want some sort of range query data structure, which supports these operations as fast as possible

✅ Range query (log n)
✅ Point update (log n), ❓range update

More variety of operations + efficient point updates, slightly worse queries.
Suppose we range update naively:

Run point update for each point in query.
If n range update queries of length n -> O(n^2logn)
This is worse than naive O(n^2) from run time!

Our data structure doesn’t support efficient range updates yet!

But it can with lazy propagation

21 of 37

Lazy propagation

Similar to how we broke up computation into “blocks”, let’s break up updates into “blocks”.
We store a variable aggAdd, indicating the “aggregate add” that should be applied to it.

If we want to add 1 to range [0, 8), node.aggAdd =1.
We also store 1 flag, indicating whether there is a pending aggAdd.

Can we derive how this should work?

22 of 37

Lazy propagation

update(0, 8, 2)
query(0, 8)
Need to define how aggAdd interacts with sum! (sum += aggAdd*l)
query(0, 4)
Need to define how aggAdd is pushed down to children!
update(0, 2, 4)
Need to define how aggAdd interacts with other aggAdds!
query(0, 4)
Need to recalculate sum after an update!

23 of 37

Let’s phrase it algorithmically

Think about a few things:

How does aggAdd get applied with sum? (sum += aggAdd * length of range)
How does aggAdd get composed with another aggAdd from above? (aggAdd += aggAddParent)
How does aggAdd get split up and pushed down to children? (aggAddChild = aggAddParent)

Whenever you call sum() on a node:

If pending aggAdd, apply aggAdd on the sum.
Push down the aggAdd to the 2 children node.
Recurse on sum() as per normal.

Whenever you call update() on a node:

Similar recursive structure as add(), with same base cases.
If node is fully contained within range, compose aggAdd to current node’s aggAdd.
If node is fully disjoint from range, do nothing.
Call update(leftChild) and update(rightChild).
Recalculate value of sum() after calling sum(leftChild) and sum(rightChild)

24 of 37

Back to the drawing board.

Whenever you call sum() on a node:

If pending operation, apply aggAdd on the sum.
Push down the aggAdd to the 2 children node.
Call sum() as per normal.

Whenever you call update() on a node:

Similar as sum(), recursive structure, base case is when node is fully contained or disjoint.
Call update(leftChild) and update(rightChild).
Recalculate sum from sum(left), sum(right).

update(0, 8, 2)
query(0, 8)
query(0, 4)
update(0, 2, 2)
query(0, 4)

25 of 37

Claim: range update is log(n)

Similar argument as for sum(), since similar recursive structures.
Each level you only do at most 4 operations, therefore O(log n).
Additionally, aggAdd only adds O(1) to each “block” operation in sum(), so it does not affect the time complexity of sum().

26 of 37

Recap

We want some sort of range query data structure, which supports these operations as fast as possible

✅ Range query (log n)
✅ Point update (log n), ✅ range update log(n)

What requirements do we need?

No longer need existence of an inverse!
To define range updates:

aggOpp.apply(sum): How to apply aggregate operation to the sum?
aggOpp.compose(aggOpp): How to compose aggregate operation?
aggOpp.pushDown(Node): How to push down an aggregate operation to another node?

Yay! We have found a fantastic range query data structure!

27 of 37

Variations on a theme

Given an array, support these operations:

Range minimum query, with range update
Range sum query, with range update
XOR query, with range update
So on and so forth!

28 of 37

Why is it not O(n) and not O(log n)?

Intuitively, lazy propagation “pushes” down operations.
But aren’t there in total 2N nodes?
Why is it not O(2N)?
Claim: amortisation
You only “push down” when you need it.
To force N push downs, you need at least N operations.
Spread out, this means 1 extra operation per.
In other words, only 1 operation to push it donw.

29 of 37

30 of 37

(Bonus 1) Square root decomposition

I have literally never used this before, but I think the idea itself is cool.
No special structure required, gives o(sqrt n) point updates, o(sqrt n) range query run time, o(n + sqrt(n)) memory usage.
Break each range query up into this:

Dangling left + block_1 + … + block_k + Dangling right
O(sqrt(n)) blocks with O(1) time for sum, dangling left and right calculated in O(sqrt(n)) time.
Overall, O(sqrt(n))

Total precomputation is sqrt(N) blocks * sqrt(N) operations = O(N)

Chunk array up into sqrt(N) blocks, each block has sqrt(N) elements.
For each block calculate total sum of block (O(sqrt(N))

Updating just recompute affected block.

31 of 37

(Bonus 2) Sparse table

I have seen this used literally only once
Requires idempotence and commutativity, but gives O(1) range query, O(log n) point update, O(n log n) memory usage.
Requires precomputation of O(n log n), for each value calculate [i,i+1), [i,i+2), [i,i+4), etc.
Key idea is this:

Because operations are idempotent, it doesn’t matter if you apply them twice!
If I want to find [4,25), combine [4, 4+16 = 20) with [25 - 16 = 9, 25)!

Sparse table works for range minimum but not for range sum!

Minimum is idempotent (min(x,min(x,y)) = min(x,y))
Sum is not idempotent! (sum(x,sum(x,y)) != sum(x,y))

32 of 37

LCA

LCA (Lowest Common Ancestor) is a “classical” algo problem.
Given a tree and two nodes, find their “lowest common ancestor”.
A lot of funni ways to tackle: some of the common ones you’ll see are

Naive (iterate upwards and find the first common ancestor)
Binary lifting (as above, but more optimised)

Since this is a range query data structures talk, is there a way to solve this with range query data structures?

33 of 37

LCAs: RMQs in Disguise

Lowest common ancestor is secretly a range minimum query problem!
Get an preorder labelling of nodes (BFS), then order these labels in a inorder.
The LCA of two nodes, is the RMQ on the range made by their positions!

Above statements make no sense as words, Conrad go draw on white board.

Later on there will be a tutorial question in CS2040S about LCA:

Common solution most people will iterate to is O(n log n) precomp, O(log n) queries
Then they will wonder “wait wtf the ideal time complexity for this is O(1) query, how??”
Enlightened NUS OrcaCode enjoyers will know that this time complexity is achievable by viewing LCA as an RMQ and using sparse table for O(1) query.

1 of 37

2 of 37

3 of 37

4 of 37

5 of 37

6 of 37

7 of 37

8 of 37

9 of 37

10 of 37

11 of 37

12 of 37

13 of 37

14 of 37

15 of 37

16 of 37

17 of 37

18 of 37

19 of 37

20 of 37

21 of 37

22 of 37

23 of 37

24 of 37

25 of 37

26 of 37

27 of 37

28 of 37

29 of 37

30 of 37

31 of 37

32 of 37

33 of 37

34 of 37

35 of 37

36 of 37

37 of 37