Published using Google Docs
Decision Trees pruning ex
Updated automatically every 5 minutes

So we evaluate the entire tree on the validation set

For each pruning candidate t now need to calculate

R(t) = r(t) * p(t),

R(T_t) = sum R(t) of all t \in T_t

g_i(t) = R(t) - R(T_t) / (| f(T_t) | - 1)

Candidate

R(t)

R(T_t)

g_i(t)

t1

25/100 * 100/100

(1 - 75/100)

this node is root

so it covers all the nodes (100/100)

T_t1 - the entire tree

6/100 are misclassified

(25/100 - 6/100)/(3 - 1) = 19/100 / 2 = 19/200

t2

21/49 * 49/100

covers only 49 out of 100

T_t2 - tree with root at t2

5/100 are misclassified

(21/100 - 5/100)/(2 - 1) = 16/100 = 32/200

T_t1 minimizes g_i(t) value - so selecting it as the best tree, i.e. prune the tree to the root

\alpha^0 = 0, \alpha^1 = 19/200