So we evaluate the entire tree on the validation set
For each pruning candidate t now need to calculate
R(t) = r(t) * p(t),
R(T_t) = sum R(t) of all t \in T_t
g_i(t) = R(t) - R(T_t) / (| f(T_t) | - 1)
Candidate | R(t) | R(T_t) | g_i(t) |
t1 | 25/100 * 100/100 (1 - 75/100) this node is root so it covers all the nodes (100/100) | T_t1 - the entire tree 6/100 are misclassified | (25/100 - 6/100)/(3 - 1) = 19/100 / 2 = 19/200 |
t2 | 21/49 * 49/100 covers only 49 out of 100 | T_t2 - tree with root at t2 5/100 are misclassified | (21/100 - 5/100)/(2 - 1) = 16/100 = 32/200 |
T_t1 minimizes g_i(t) value - so selecting it as the best tree, i.e. prune the tree to the root
\alpha^0 = 0, \alpha^1 = 19/200