1
CS61B, Fall 2024 @ UC Berkeley
Slides credit: Josh Hug
Tries
Lecture 28
Tries (Conceptual)
Lecture 28, CS61B, Fall 2024
Tries (Conceptual)
Trie Implementation and�Performance
Alternate Child Tracking Strategies
Trie String Operations
Autocomplete
Trie Summary
Abstract Data Types vs. Specific Implementations
There are many ways to implement an abstract data type.
PQ
List
Set
Map
DisjointSets
Separate Chaining Hash Table
LinkedList
Resizing Array
Heap
LLRB
BST (Vanilla)
B-Trees (2-3 / 2-3-4)
Heap
Ordered Linked List
Balanced Tree
Resizing Array
LinkedList
Quick Find
Quick Union
Weighted QU
WQUPC
BST and Hash Table Set Runtimes
Runtimes for our Balanced Search Tree and Hash Table implementations were very fast.
If we know that our keys all have some common special property, we can sometimes get even better implementations.
Example: Suppose we know our keys are always single ASCII characters.
| contains(x) | add(x) | |
Balanced BST | Θ(log N) | Θ(log N) | |
Resizing Separate�Chaining Hash Table | Θ(1) assuming even spread | Θ(1) on average, assuming even spread | |
Special Case 1: Character Keyed Map
Suppose we know that our keys are always ASCII characters.
public class DataIndexedCharMap<V> {
private V[] items;
public DataIndexedCharMap(int R) {
items = (V[]) new Object[R];
}
public void put(char c, V val) {
items[c] = val;
}
public V get(char c) {
return items[c];
}
}
R is the number of possible characters, e.g. 128 for ASCII.
| key type | get(x) | add(x) |
Balanced BST | comparable | Θ(log N) | Θ(log N) |
Resizing Separate Chaining Hash Table | hashable | Θ(1) assuming even spread | Θ(1) on average, assuming even spread |
data indexed array | chars | Θ(1) | Θ(1) |
Special Case 2: String Keyed Map
Suppose we know that our keys are always strings.
Tries will have great performance on:
| key type | get(x) | add(x) |
Balanced BST | comparable | Θ(log N) | Θ(log N) |
Resizing Separate Chaining Hash Table | hashable | Θ(1) assuming even spread | Θ(1) on average, assuming even spread |
data indexed array | chars | Θ(1) | Θ(1) |
Tries | Strings | ? | ? |
Sets of Strings
Suppose we have a set containing “sam”, “sad”, “sap”, “same”, “a”, and “awls”.
sad
same
sap
awls
a
0
1
2
3
sad
awls
a
same
sap
sam
sam
BST
Hash Table
Tries: Each Node Stores One Character
For String keys, we can use a “Trie”. Key ideas:
Above, we show the results of adding “sam” and sad”. Use your intuition to try to insert the remaining items “sap”, “same”, “a”, and “awls”.
s
a
m
d
Tries: Each Node Stores One Character
For String keys, we can use a “Trie”. Key ideas:
Above, we show the results of adding “sam” and sad”. Use your intuition to try to insert the remaining items “sap”, “same”, “a”, and “awls”.
s
a
m
d
p
e
a
Tries: Each Node Stores One Character
For String keys, we can use a “Trie”. Key ideas:
Try to figure out a way to make it clear that our set contains “sam”, “sad”, “sap”, “same”, “a”, and “awls”, but not “aw”, “awl”, “sa”, etc.
s
a
m
d
p
e
a
w
l
s
Tries: Each Node Stores One Character
For String keys, we can use a “Trie”. Key ideas:
Try to figure out a way to make it clear that our set contains “sam”, “sad”, “sap”, “same”, “a”, and “awls”, but not “aw”, “awl”, “sa”, etc.
s
a
m
d
p
e
a
w
l
s
s
Tries: Search Hits and Misses
Suppose we insert “sam”, “sad”, “sap”, “same”, “a”, and “awls”.
s
a
m
d
p
e
a
w
l
s
Two ways to have a search “miss”:
true, blue node
false, white node
true, blue node
false, fell off tree
“hit”
“miss”
Trie Maps
Tries can also be maps, of course.
For an animated demo of the creation of this map, see this demo from our optional Algorithms textbook.
e.g. maps “by” to 4.
y 4
b
s
e
a 6
l
l
s 1
h
e 0
l
l
s 3
o
r
e 7
t
h
e 5
Tries: A Digit-by-Digit Set Representation
s
a
m
d
p
e
a
w
l
s
sad
same
sap
awls
a
0
1
2
3
sad
awls
a
same
sap
sam
sam
BST
HashSet
Trie
Tries
Trie:
Trie Implementation and Performance
Lecture 28, CS61B, Fall 2024
Tries (Conceptual)
Trie Implementation and Performance
Alternate Child Tracking Strategies
Trie String Operations
Autocomplete
Trie Summary
Very Basic Trie Implementation
The first approach might look something like the code below.
s
a
m
d
p
e
a
w
l
s
public class TrieSet {
private static final int R = 128; // ASCII
private Node root; // root of trie
private static class Node {
private char ch;
private boolean isKey;
private DataIndexedCharMap<Node> next;
private Node(char c, boolean b, int R) {
ch = c; isKey = b;
next = new DataIndexedCharMap<>(R);
}
}
}
Since we know our keys are characters, can use a DataIndexedCharMap.
Zooming in On a Node
Each DataIndexedCharMap is an array of 128 possible links, mostly null.
a
w
...
...
private static class Node {
private char ch;
private boolean isKey;
private DataIndexedCharMap<Node> next;
private Node(char c, boolean b, int R) {
ch = c; isKey = b;
next = new DataIndexedCharMap<>(R);
}
Zooming in On a Node
Better drawing of a DataIndexedCharMap based trie is shown to the right.
a
w
private static class Node {
private char ch;
private boolean isKey;
private DataIndexedCharMap<Node> next;
private Node(char c, boolean b, int R) {
ch = c; isKey = b;
next = new DataIndexedCharMap<>(R);
}
a
w
w
...
128 links, with one used, and 127 equal to null.
Very Basic Trie Implementation
If we use a DataIndexedCharMap to track children, every node has R links.
private static class Node {
private char ch;
private boolean isKey;
private DataIndexedCharMap<Node> next;
private Node(char c, boolean b, int R) {
ch = c; isKey = b;
next = new DataIndexedCharMap<>(R);
}
s
a
d
a
w
l
a
s
a
d
w
l
...
...
...
...
...
...
...
...
public class DataIndexedCharMap<V> {
private V[] items;
public DataIndexedCharMap(int R) {
items = (V[]) new Object[R];
}
...
}
Very Basic Trie Implementation
Observation: The letter stored inside each node is actually redundant.
public class TrieSet {
private static final int R = 128; // ASCII
private Node root; // root of trie
private static class Node {
private char ch;
private boolean isKey;
private DataIndexedCharMap<Node> next;
private Node(char c, boolean b, int R) {
ch = c; isKey = b;
next = new DataIndexedCharMap<>(R);
}
}
}
a
s
a
d
w
l
...
...
...
...
...
...
...
...
Trie Performance in Terms of N
Given a Trie with N keys. What is the: [N = 6]
a
s
a
d
w
l
...
...
...
...
...
...
...
...
s
m
p
e
...
...
Trie Performance in Terms of N
Given a Trie with N keys. What is the: [N = 6]
a
s
a
d
w
l
...
...
...
...
...
...
...
...
s
m
p
e
...
...
Runtimes independent of number of keys!
Or in terms of L, the length of the key:
Trie Performance in Terms of N
When our keys are strings, Tries give us slightly better performance on contains and add.
One downside of the DataIndexedCharMap-based Trie is the huge memory cost of storing R links per node.
Runtimes treating length of keys as a constant
| key type | get(x) | add(x) |
Balanced BST | comparable | Θ(log N) | Θ(log N) |
Resizing Separate Chaining Hash Table | hashable | Θ(1) assuming even spread | Θ(1) on average, assuming even spread |
data indexed array | chars | Θ(1) | Θ(1) |
Tries | Strings | Θ(1) | Θ(1) |
Alternate Child Tracking Strategies
Lecture 28, CS61B, Fall 2024
Tries (Conceptual)
Trie Implementation and�Performance
Alternate Child Tracking Strategies
Trie String Operations
Autocomplete
Trie Summary
Trie Performance in Terms of N
Using a DataIndexedCharMap is very memory hungry.
a
s
a
d
w
l
private static class Node {
private boolean isKey;
private DataIndexedCharMap<Node> next;
private Node(char c, boolean b, int R) {
ch = c; isKey= b;
next = new DataIndexedCharMap<>(R);
}
The DataIndexedCharMap Trie
Can use ANY kind of map from character to node, e.g.
isKey:
| | | | | |
links:
F |
...
97
98
100
99
...
isKey:
| | | | | |
links:
F |
...
97
98
100
99
...
isKey:
| | | | | |
links:
T |
...
97
98
100
99
...
a
d
isKey:
| | | | | |
links:
T |
...
97
98
100
99
...
c
Fundamental problem, our arrays are ‘sparse’.
Alternate Idea #1: The Hash-Table Based Trie
isKey:
F |
a
d
c
links:
isKey:
F |
links:
isKey:
T |
links:
c
a
0
1
2
3
d
0
1
2
3
0
1
2
3
Alternate Idea #2: The BST-Based Trie
isKey:
F |
a
d
c
‘c’ | |
‘a’ | |
links:
isKey:
F |
‘d’ | |
links:
isKey:
T |
links:
isKey:
T |
links:
The Three Trie Implementations
When we implement a Trie, we have to pick a map to our children
dictionary from letter to Node
DataIndexedCharMap
Hash Table
Balanced BST
this.next
Performance of the DataIndexedCharMap, BST, and Hash Table Trie
Using a BST or a Hash Table to store links to children will usually use less memory.
Using a BST or a Hash Table will take slightly more time.
Since R is fixed (e.g. 128), can think of all 3 as Θ(1).
s
a
m
d
p
e
a
w
l
s
Trie Performance in Terms of N
When our keys are strings, Tries give us slightly better performance on contains and add.
… but where Tries really shine is their efficiency with special string operations!
Runtimes treating length of keys as a constant
| key type | get(x) | add(x) |
Balanced BST | comparable | Θ(log N) | Θ(log N) |
Resizing Separate Chaining Hash Table | hashable | Θ(1) assuming even spread | Θ(1) on average, assuming even spread |
data indexed array | chars | Θ(1) | Θ(1) |
Tries (BST, Hash Table, Data Indexed Char Map) | Strings | Θ(1) | Θ(1) |
Trie String Operations
Lecture 28, CS61B, Fall 2024
Tries (Conceptual)
Trie Implementation and�Performance
Alternate Child Tracking Strategies
Trie String Operations
Autocomplete
Trie Summary
String Specific Operations
Theoretical asymptotic speed improvement is nice. But the main appeal of tries is their ability to efficiently support string specific operations like prefix matching.
s
a
m
d
p
e
a
w
l
s
sad
same
sap
awls
a
sam
0
1
2
3
sad
awls
a
same
sap
sam
Prefix Matching Operations
Theoretical asymptotic speed improvement is nice. But the main appeal of tries is their ability to efficiently support string specific operations like prefix matching.
s
a
m
d
p
e
a
w
l
s
Examples:
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect() returns ["a", "awls", "sad", "sam", "same", "sap"]
s
a
m
d
p
e
a
w
l
s
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect() returns ["a", "awls", "sad", "sam", "same", "sap"]
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect() returns ["a", "awls", "sad", "sam", "same", "sap"]
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
x = []
colHelp("a", x, )
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
x = ["a"]
colHelp("aw", x, )
colHelp("a", x, )
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
x = ["a"]
colHelp("a", x, )
colHelp("aw", x, )
colHelp("awl", x, )
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
x = ["a"]
colHelp("a", x, )
colHelp("aw", x, )
colHelp("awls", x, )
colHelp("awl", x, )
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
x = ["a", "awls"]
colHelp("a", x, )
colHelp("aw", x, )
colHelp("awls", x, )
colHelp("awl", x, )
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
colHelp("s", x, )
x = ["a", "awls"]
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
colHelp("s", x, )
colHelp("sa", x, )
x = ["a", "awls"]
Challenging Warmup Exercise: Collecting Trie Keys
Challenging Exercise: Give an algorithm for collecting all the keys in a Trie.
collect():
colHelp(String s, List<String> x, Node n):
s
a
m
d
p
e
a
w
l
s
x = ["a", "awls", "sad",
"sam", "same", "sap"]
colHelp("s", x, )
colHelp("sa", x, )
...
Usages of Tries
Challenge: Give an algorithm for keysWithPrefix.
s
a
m
d
p
e
a
w
l
s
Usages of Tries
Challenge: Give an algorithm for keysWithPrefix.
Algorithm:
Another common operation: LongestPrefixOf. See lab.
s
a
m
d
p
e
a
w
l
s
Autocomplete
Lecture 28, CS61B, Fall 2024
Optional, see videos if you're curious.
Tries (Conceptual)
Trie Implementation and�Performance
Alternate Child Tracking Strategies
Trie String Operations
Autocomplete
Trie Summary
The Autocomplete Problem
Example, when I type “how are” into Google, I get 10 results, shown to the right.
One way to do this is to create a Trie based map from strings to values
Autocomplete Example, for Top Three Matches
Suppose we have six strings with values shown below:
If the user types “s”, we:
s
a
m
d
p
o
b
u
c
k
g
y
i
t
e
10
12
5
15
20
7
Autocomplete Example, for Top Three Matches
Suppose we have six strings with values shown below:
If the user types “s”, we:
s
a
m
d
p
o
b
u
c
k
g
y
i
t
e
The Autocomplete Problem
One way to do this is to create a Trie based Dictionary that maps strings to values.
The approach above has one major flaw. If we enter a short string, the number of keys with the appropriate prefix will be too big.
A More Efficient Autocomplete
One way to address this issue:
s
a
m
d
p
o
b
u
c
k
g
y
i
t
e
None
10
None
10
None
10
10
10
value = None
best = 20
None
20
None
12
12
12
None
5
None
5
5
5
None
20
None
20
15
20
20
20
20
7
7
A More Efficient Autocomplete
One way to address this issue:
Search will consider nodes in order of “best”.
s
a
m
d
p
o
b
u
c
k
g
y
i
t
e
None
10
None
10
None
10
10
10
value = None
best = 20
None
20
None
12
12
12
None
5
None
5
5
5
None
20
None
20
15
20
20
20
20
7
7
Even More Efficient Autocomplete
Can also merge nodes that are redundant!
s
ad
mog
p
buck
y
it
e
10
10
value = None
best = 20
None
20
12
12
5
5
None
20
15
20
20
20
20
7
7
Trie Summary
Lecture 28, CS61B, Fall 2024
Tries (Conceptual)
Trie Implementation and�Performance
Alternate Child Tracking Strategies
Trie String Operations
Autocomplete
Trie Summary
Tries
When your key is a string, you can use a Trie:
Bottom line: Data structures interact in beautiful and important ways!
Domain Specific Sets and Maps
More generally, we can sometimes take special advantage of our key type to improve our sets and maps.