LO 4.2.3.E
Learning Objective: Describe the construction of classification trees using classification error rate, Gini index, and cross-entropy.
Review:
We use recursive binary splitting to grow a classification tree. Instead of using RSS (regression tree) we use the classification error rate.
The classification error rate is the fraction of the training observations in that region that do not belong to the most common class:
where
pmk represents the proportion of training observations in the mth region that are from the kth class.
It is used as a measure of node purity: the lower the Gini index, the purer the node. The Gini Index function is defined by
where
pmk represents the proportion of training observations in the mth region that are from the kth class.
It is also used as a measure of node purity: the lower the Cross-entropy, the purer the node. The Cross-entropy function is defined by
where
pmk represents the proportion of training observations in the mth region that are from the kth class.