1 of 48

Lecture 8:

Data Structures (Trees)

2 of 48

Binary Tree

2

3 of 48

Definition of Tree

  • A general tree T is partitioned into disjoint subsets:
    • A single node r, the root
    • Sets of general trees, called subtrees of r
  • Terminology
    • node (vertex)
    • edge
    • parent
    • child
    • siblings
    • root
    • leaf
    • ancestor
    • descendant
    • subtree

3

4 of 48

Tree Examples

4

5 of 48

Definition of Binary Tree

  • T is binary tree, if
    • T is empty, or
    • T is partitioned into three disjoint subsets:
      • A single node r, the root
      • Up to two sets that are binary trees, called left and right subtrees of r

r

TL

TR

5

6 of 48

Announcement

  • Soo will be traveling next week!
    • One video lecture will be out for Tree part 2
    • Quiz 2 will be on 10/30.

6

7 of 48

An Example of Binary Tree

  • Algebraic expressions

7

8 of 48

Height of a Tree

  • Def. Height of a Tree: the number of nodes on the longest path from the root to a leaf.

Height: 3

Height: 5

Height: 7

8

9 of 48

Full Binary Tree

  • Def. A binary tree T of height h is called Full binary tree, if
    • it is empty (h = 0), or
    • both its left and right subtrees are full (of height h - 1).

Full BT of

Height 0

Full BT of

Height 1

Full BT of

Height 2

Full BT of

Height 3

Full BT of Height 4

So now, you may feel why this is called full binary tree:

it is filled with nodes at all possible positions!

9

10 of 48

Complete Binary Tree

  • Def. A binary tree T is called Complete binary tree, if
    • T is full down to level h - 1, and
    • with level h filled in from left to right.

Complete BT of

Height 2

Complete BT of

Height 3

Complete BT of Height 4

Must be filled (to be complete)

Optionally filled (to be complete)

10

11 of 48

Array-based Implementation

  • Recall: Array is a fundamental data structure with contiguous fixed-length chunk of memory space.
    • When we implement List with an array, it was straightforward to map the indices.
    • How can we assign designated indices for a binary tree?
  • Let’s assume the tree is full, and assign the indices from root, left to right.
  • This implementation is efficient if the tree is (almost) full or complete.

[1]

[2]

[3]

[4]

[5]

[6]

[0]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

8

9

7

2

3

4

1

5

5

2

1

8

7

3

4

9

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Height 1

Height 2

Height 3

Height 4

11

12 of 48

Array-based Implementation

  • One more thing to consider: How can we figure out direct children of a node?
    • With a tree, we usually access data from the root to the leaf.
    • Common rule connecting a node to its children?
  • From parent, its left child:�[parent index] * 2 + 1 😀
  • Right child:�[parent index] * 2 + 2 😄
  • Similarly, from a child, its parent:�[child index - 1] // 2

[1]

[2]

[3]

[4]

[5]

[6]

[0]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

8

7

2

3

9

2

1

5

12

13 of 48

Array-based Implementation

  • Now, think about more general case: the tree may not be close to complete.
    • If we don’t have a node, we may set None (or some other indicator for emptiness).
  • This implementation becomes less efficient if the tree gets far from full or complete.

5

2

1

8

None

3

None

None

7

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[1]

[2]

[3]

[4]

[5]

[6]

[0]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

8

7

2

3

9

2

1

5

13

14 of 48

Reference-based Implementation

  • It is similar to the linked list; instead of having a single link to another node, Binary Tree Node has two references to other nodes!

class TreeNode():

def __init__(self, x):

self.item = x

self.left = None

self.right = None

class BinaryTree():

def __init__(self, node):

self.root = node

14

15 of 48

Tree Traversal

  • Traversal: visiting all nodes once
    • It is not straightforward to visit all nodes in a given tree, unlike an array or a linked list.�
  • There can be many different ways of traversal. We cover a few widely-used ones:
  • Inside each subtree, the same rule applies recursively.

r

TL

TR

r

TL

TR

r

TL

TR

Preorder

Root - Left - Right

2

1

3

Inorder

Left - Root - Right

Postorder

Left - Right - Root

1

2

3

1

3

2

15

16 of 48

Tree Traversal Example

  • Preorder traversal?
  • Inorder traversal?
  • Postorder traversal?

60 {20 subtree} 70

60 20 10 {40 subtree} 70

60 20 10 40 30 50 70

{20 subtree} 60 70

10 20 {40 subtree} 60 70

10 20 30 40 50 60 70

{20 subtree} 70 60

10 {40 subtree} 20 70 60

10 30 50 40 20 70 60

60

20

70

10

40

30

50

16

17 of 48

Tree Traversal Implementation

  • Use recursion!
    • Base case: empty tree. Do nothing!
    • General case: 2 recursive calls + visiting the root.
    • The order depends on {pre, in, post}-order.

def preorder(node):

if node is not None:

print(node.item)

preorder(node.left)

preorder(node.right)

preorder(root)

def inorder(node):

if node is not None:

inorder(node.left)

print(node.item)

inorder(node.right)

inorder(root)

17

18 of 48

Binary Search Tree

18

19 of 48

Definition of Binary Search Tree

  • Each node has a search key.
    • There are no duplicates among the search keys in a binary search tree.�
  • For each node n, it satisfies:
    • n’s key is greater than all keys in its left subtree TL.
    • n’s key is less than all keys in its right subtree TR.
    • Both TL and TR are binary search trees.

r

TL

TR

<

<

19

20 of 48

Binary Search Tree Operations

  • Insert a new item into a binary search tree.�
  • Retrieve (search) the item with a given search key from a binary search tree.�
  • Delete the item with a given search key from a binary search tree.�
  • For all, according to the rule that {left subtree} < root < {right subtree} at all levels!

20

21 of 48

Binary Search Tree Examples

Check! All these 3 BSTs contain the same data.

21

22 of 48

Search (Retrieval)

  • Task: Search if there is a node with the given key, and output the item if so.
    • If the given key is not in the tree, we should be able to figure it out as well.
  • Main Idea: At each node, decide which subtree to search further. Only 3 cases:
    • [Case 1] If the search key = item at the node, we found it!
    • [Case 2] If the search key < item at the node, the target must be in its left subtree if exists.
    • [Case 3] If the search key > item at the node, the target must be in its right subtree if exists.
    • For Case 2 & 3, move to the corresponding subtree, then repeat the same testing.
    • Repeat until you reach at a leaf. If you still do not meet Case 1, we conclude the key is not in the tree.

60

20

70

10

40

30

50

40

55

Found!

Not exists

22

23 of 48

Search (Retrieval): Implementation

  • Implementation based on recursive calls:

Base case (empty BST): Not found!

General case: case 1, 2, 3

def search(root, key):

if root is None:

return None # Not found

elif key == root.item:

return root # Found

elif key < root.item:

return search(root.left, key)

else: # key > root.item

return search(root.right, key)

23

24 of 48

Search (Retrieval): Time Complexity

  • Time complexity of search operation?
    • At each node, we compare two values once, and decide where to go. → O(1)
    • How many times?
  • Thus, the worst time performance = tree height!
  • In terms of N (the number of data points in the tree), what’s the asymptotic (Big O) complexity of tree search?

If the tree is balanced (close to full), the height gets closer to O(log N).

If the tree is unbalanced (close to linked list), the height gets closer to O(N).

Then, how the shape of a tree is determined? We will revisit this soon.

Length from the root to the leaf!

24

25 of 48

Insertion

  • Task: Insert a new key into the BST, preserving BST conditions.
    • We need to find the right location to put the new node.
  • Main Idea: Insertion is basically a failed search. When we conclude the item does not exist, insert the new node right there!
    • [Case 1] If the search key = item at the node, we found it!
    • [Case 2] If the search key < item at the node, the target must be in its left subtree if exists.
    • [Case 3] If the search key > item at the node, the target must be in its right subtree if exists.
    • For Case 2 & 3, move to the corresponding subtree, then do the same testing.
    • Repeat until you reach at a leaf. Insert the new node there!

60

20

70

10

40

30

50

55

55

25

26 of 48

Insertion: Implementation

  • Implementation based on recursive calls:

Base case: Create new node

General case: 3 ways

def insert(root, item):

if root is None:

new_node = TreeNode(item)

return new_node

elif key == root.item:

# ERROR: already exists

elif key < root.item:

new_subtree = insert(root.left, item)

root.left = new_subtree

return root

else: # key > root.item

new_subtree = insert(root.right, item)

root.right = new_subtree

return root

26

27 of 48

Deletion

  • Task: Delete a node with the given key, preserving BST conditions after deletion.
    • After deletion of an intermediate node, the tree will be broken 😨.
  • Main Idea:
    • [Case 1] If the node to delete is a leaf, simply remove it.
    • [Case 2] If the node to delete has a single child, the subtree will take it over.
    • [Case 3] If the node to delete has both children, elect the leftmost item in the right subtree as the new root.

27

28 of 48

Deletion (Case 1)

  • When the target node is a leaf, we can simply remove it.
  • After deletion, the resulting tree
    • is still connected, and
    • still satisfies the BST conditions.

60

20

70

10

40

30

50

80

35

28

29 of 48

Deletion (Case 2)

  • When the target node has one child, the existing child takes the position of the target node, taking its descendents (subtree).
  • After deletion, the resulting tree
    • is broken,
    • still satisfies the BST conditions.
  • After the right subtree takes over, the resulting tree
    • is no longer broken,
    • still satisfies the BST conditions.

This slide is best seen with animations.

60

20

70

10

40

30

50

80

35

29

30 of 48

Deletion (Case 3)

  • When the target node has both children, we elect the target’s immediate successor (=left-most node in the right subtree) as the new root.
  • After deletion, the resulting tree is broken.
    • Cannot simply take one subtree as new root, since we have two.
  • To make the resulting tree satisfy BST conditions, we elect the immediate successor of the deleted node. The resulting tree, however, is still broken!
    • Why immediate successor? With it, it’s guaranteed to satisfy the BST conditions!

This slide is best seen with animations.

Similarly, we may nominate the immediate predecessor (=right-most item in the left subtree.)

  • The deleted immediate successor node may need to adopt its orphan right child (if any).
    • Here, it is guaranteed that no left child exists.

60

20

70

10

40

30

50

80

30

35

35

30

31 of 48

Deletion: Implementation

Locating the target

delete function returns the subtree to replace the target node to be deleted.

def delete(root, key):

if root is None:

return root

if key < root.key:

root.left = delete(root.left, key)

elif key > root.key:

root.right = delete(root.right, key)

else:

... (to be continued)

31

32 of 48

Deletion: Implementation

Case 3: both children

Case 1: no child if root.right is also None

Case 2: single child

After this line, we now delete the node im_su.

... (continuing)�

else:

if root.left is None:

new_root = root.right

root = None

return new_root

elif root.right is None:

new_root = root.left

root = None

return new_root

else:

im_su = get_immediate_successor(root)

root.key = im_su.key

root.right = delete(root.right, im_su.key)

return root

def get_immediate_successor(target):

curr = target.right

while curr.left is not None:

curr = curr.left

return curr

32

33 of 48

Revisiting the shape of BST

  • For a given data, how is the shape of BST determined?
    • Let’s try to insert “4, 7, 2, 3, 5, 1, 6”:

4

7

2

3

5

1

6

    • Suppose the same data is given in a different order: “7, 6, 5, 4, 3, 2, 1”:

7

6

5

4

3

2

1

33

34 of 48

Balanced Trees

  • We desire to make the tree more balanced for faster operations.
  • There are some special trees that guarantee balance up to some level, but details of them are beyond of this course.

Red-black tree: guarantees that the height is lower than 2 log(N+1).

AVL tree: guarantees that heights of the two child subtrees of any node differ by at most 1.

34

35 of 48

Tree Sort

  • Inorder traversal on a binary search tree lists the data in sorted order.
    • Due to the BST conditions, all values in the left subtree < root value < all values in the right subtree, at all nodes.
    • Inorder traversal visits nodes in the order of left - root - right!
  • Time complexity?
    • We first need to insert all data into a BST.�→ O(log N) per each element × N of them = O(N log N)�(This assumes use one of the balanced trees!)
    • Then, inorder traversal: O(N)�← we visit one node at a time.
    • Overall, O(N log N) if the tree is balanced.�Otherwise, worst performance will be O(N2).

r

TL

TR

Inorder

Left - Root - Right

1

2

3

35

36 of 48

Time Complexity

  • Time complexity of Binary Search Tree operations?

= When the tree is significantly unbalanced.

When?

Task

Average-case

Worst-case

Insertion

Retrieval

Deletion

Traversal

O(log N)

O(log N)

O(log N)

O(N)

O(N)

O(N)

O(N)

O(N)

36

37 of 48

Applications of Binary Search Trees

37

38 of 48

HW1

38

39 of 48

HW1

39

40 of 48

HW2

  • Test if a binary tree is balanced.
    • A binary tree is balanced if the difference in the height of its left and right subtrees is at most 1 for all nodes in the tree.

60

20

70

10

40

30

50

60

20

70

10

40

No

Yes

40

41 of 48

HW2

41

42 of 48

HW3

  • Test if two binary trees are symmetric.

60

20

70

10

40

30

50

60

20

70

10

40

30

50

42

43 of 48

HW3

43

44 of 48

HW4

  • Given a sorted list of items, build a balanced binary search tree.

44

45 of 48

HW4

45

46 of 48

HW5

46

47 of 48

HW6

47

48 of 48

HW7

48