Decision Trees�
DECISION TREE
Decision Tree Learning �
Decision Tree Learning
Decision Tree Representation
PlayTennis: This decision tree classifies Saturday mornings according to whether or not they are suitable for playing tennis
Decision Tree Representation - Classification
PlayTennis: This decision tree classifies Saturday mornings according to whether or not they are suitable for playing tennis
Appropriate problems for decision tree learning
Basic Decision tree Learning Algorithm ID3
ID3 Algorithm
Top-Down induction of decision trees
Which attribute is the best classifier?
Entropy
The entropy varies between 0 and 1. if all the members belong to the same class => The entropy = 0
if there are equal number of positive and
negative examples => The entropy = 1
Where Pi – is the proportion of S belonging to class i
Entropy - Example
= - (29/64) log2(29/64) - (35/64) log2(35/64)
= 0.994
Information Gain
The measure of effectiveness of an attribute in classifying the training data is called information gain
Where
Values (A) is the set of all possible values of attribute A
Information Gain - Example
= 0.994 – (26/64)*.706 – (38/64)*.742
= 0.266
along attribute A1 is 0.266
= 0.994 – (51/64)*.937 – (13/64)*.619
= 0.121
along attribute A2 is 0.121
Hypothesis space search in
decision tree learning
17
HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING
18
.
19
ID3 capabilities and limitations in search space and search strategy.
20
.
21
INDUCTIVE BIAS IN DECISION TREE LEARNING
22
INDUCTIVE BIAS IN DECISION TREE LEARNING
23
Approximate inductive bias of ID3
24
Front-projection display system
unmanned aerial vehicles
Sports Event Interpretation
Restriction Biases and Preference Biases
25
algorithm is in the form of a categorical restriction on the set of hypotheses
considered. This form of bias is typically called a restriction bias (or,
alternatively, a language bias).
26
Why Prefer Short Hypotheses?
22/04/18
ISSUES IN DECISION
TREE LEARNING
ISSUES IN DECISION TREE LEARNING
REDUCED ERROR PRUNING
RULE POST-PRUNING
30
.
Avoiding Overfitting the Data
Definition: Given a hypothesis space H, a hypothesis h E H is said to overfit the training data if there exists some alternative hypothesis h' E H, such that h has smaller error than h' over the training examples, but h' has a smaller error than h over the entire distribution of instances.
31
32
There are several approaches to avoiding overfitting in decision tree learning.
These can be grouped into two classes:
2 Approaches that allow the tree to overfit the data, and then post-prune the tree.
To determine the correct final tree size. Approaches include:
Use a separate set of examples, distinct from the training examples, to evaluate the utility of post-pruning nodes from the tree.
Use all the available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set. For example, Quinlan (1986) uses a chi-square test to estimate whether further expanding a node is likely to improve performance over the entire instance distribution, or only on the current sample of training data.
Use an explicit measure of the complexity for encoding the training examples and the decision tree, halting growth of the tree when this encoding size is minimized. This approach, based on a heuristic called the Minimum Description Length principle.
34
Front-projection display system
unmanned aerial vehicles
Sports Event Interpretation
The available data are separated into two sets of examples: a training set, which is used to form the learned hypothesis, and a separate validation set, which is used to evaluate the accuracy of this hypothesis over subsequent data and, in particular, to evaluate the impact of pruning this hypothesis.
Withhold one-third of the available examples for the validation set, using the other two-thirds for training.
REDUCED ERROR PRUNING
One approach,called reduced-error pruning (Quinlan 1987), is to consider each of the decision nodes in the.tree to be candidates for pruning. Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node.
Nodes are removed only if the resulting pruned tree performs no worse than-the original over the validation set.
36
The major drawback of this approach is that when data is limited, withholding part of it for the validation set reduces even further the number of examples available for training.
RULE POST-PRUNING
One quite successful method for finding high accuracy hypotheses is a technique called rule post-pruning.
1. Infer the decision tree from the training set, growing the tree until the training data is fit as well as possible and allowing overfitting to occur.
2. Convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.
3. Prune (generalize) each rule by removing any preconditions that result in improving its estimated accuracy.
4. Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.
38
In rule postpruning, one rule is generated for each leaf node in the tree. Each attribute test along the path from the root to the leaf becomes a rule antecedent (precondition) and the classification at the leaf node becomes the rule consequent (postcondition).
IF (outlook = sunny) ^(humidity=high)
THEN playtennis=No
Incorporating Continuous-Valued Attributes
Alternative Measures for Selecting Attributes
where S1 through S, are the c subsets of examples resulting from partitioning S by the c-valued attribute A.
Handling Training Examples with Missing Attribute Values
Handling Attributes with Differing Costs
medical diagnosis rules. Here the attributes are different symptoms and
laboratory tests with differing costs. His system uses a somewhat different
attribute selection measure
versus information gain. Nunez (1991) presents an empirical comparison of these
two approaches over a range of tasks
Thank you