�
Swayam Prabha
Course Title
Multivariate Data Mining- Methods and Applications
Lecture 33
Recursive Partitioning: Decision Trees
By
Anoop Chaturvedi
Department of Statistics, University of Allahabad
Prayagraj (India)
Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha
Classification
Examples
Data Mining_Anoop Chaturvedi
2
Decision Tree or Classification Tree
Data Mining_Anoop Chaturvedi
3
Data Mining_Anoop Chaturvedi
4
Node ⇒ Subset of the set of variables. Can be a terminal or non-terminal node.
Non-terminal node or parent node ⇒ Node that can split into two daughter nodes.
Terminal node or Leaf nodes ⇒ Node that cannot split.
The decision tree is a binary tree. Each non-terminal node is split into two daughter nodes, each leading to two disjoint binary trees called the left subtree and right subtree of the root.
Data Mining_Anoop Chaturvedi
5
Data Mining_Anoop Chaturvedi
6
Data Mining_Anoop Chaturvedi
7
Data Mining_Anoop Chaturvedi
8
Example: Decision tree with two input variables, five terminal nodes and four splits.
Data Mining_Anoop Chaturvedi
9
Data Mining_Anoop Chaturvedi
10
Data Mining_Anoop Chaturvedi
11
Data Mining_Anoop Chaturvedi
12
Data Mining_Anoop Chaturvedi
13
Data Mining_Anoop Chaturvedi
14
Node impurity functions for the two-class case.
Entropy function (rescaled) ⇒ Red curve
Gini index ⇒ Green curve
Re-substitution estimate of the misclassification rate ⇒ blue curve.
Data Mining_Anoop Chaturvedi
15
Data Mining_Anoop Chaturvedi
16
Data Mining_Anoop Chaturvedi
17
| L | R | Total | |
0 | 1 | |||
L | | | | |
R | | | | |
Total | | | | |
Data Mining_Anoop Chaturvedi
18
Data Mining_Anoop Chaturvedi
19
Data Mining_Anoop Chaturvedi
20
Data Mining_Anoop Chaturvedi
21
Data Mining_Anoop Chaturvedi
22
Data Mining_Anoop Chaturvedi
23
Data Mining_Anoop Chaturvedi
24