LO 4.2.3.D

Learning Objective: Describe tree pruning, specifically cost complexity (weakest link) pruning.

Review:

Recursive binary splitting might result in a too complex tree. Therefore, a better strategy is to grow a very large tree T0, and then prune it back in order to obtain a subtree, T ⊂ T0 .
Cost complexity pruning - aka weakest link pruning - provides an efficient way to do the pruning. Rather than considering every possible subtree, we can consider a sequence of trees indexed by a nonnegative tuning parameter α.

The goal is to find, for each value of α, the subtree T that minimizes the loss function

where

𝛼 |T| is the penalty component (reminiscent of the Lasso method).

𝛼 ≥ 0 is the tuning parameter.

|T| is the number of terminal nodes of the tree T.

Rm is the rectangle corresponding to the mth terminal node.

is the predicted response associated with Rm - that is, the mean of the training observations in Rm .

Source: Assigned reading