Machine Learning I:
Decision Trees
Patrick Hall
Visiting Faculty, Department of Decision Sciences
George Washington University
Lecture 4 Agenda
Where are we in the modeling lifecycle?
Data Collection �& ETL
Feature Selection & Engineering
Supervised�Learning
Unsupervised�Learning
Deployment
Cost Intensive
Revenue
Generating
Assessment & Validation
�Brief Introduction
General Overview
Decision Tree Anatomy
Decision Tree Structure: Simple Example
�Decision Tree Algorithm
Greedy Heuristics,
Impurity, Information Gain, & Tree Pruning
Basic Algorithm Overview
Sources: Introduction to Data Mining and Introduction to Statistical Learning
Training for Classification
Source: Elements of Statistical Learning
Recursive Binary Split Method
Cost Function: Impurity Test
Source: Elements of Statistical Learning
Impurity Measures of a Classification Tree
Source: Introduction to Data Mining
Impurity Computation: At Single Node
Source: Introduction to Data Mining
Impurity Computation: Collective Child Nodes
Source: Introduction to Data Mining
Impurity Computation: Collective Child Nodes
Source: Introduction to Data Mining
Feature Split Test Condition: Information Gain
(Information gain is the mutual information between the class variable and the splitting variable)
Source: Introduction to Data Mining
Goodness of a Feature Test Condition: Information Gain
Stopping Criterion and Tree Pruning
Source: Introduction to Statistical Learning
Preference Bias: Ockham’s Razor
The simplest consistent explanation is the best: smallest decision tree that correctly classifies most of the training examples is the best tree.
Source: Patrick Hall, Decision Trees, GWU
Tree Pruning via Validation Error
Decision Tree Model Scoring
Variable Importance
Variable Importance
Characteristics of Decision Tree
Advantages
Disadvantages
�Banknote Case
Using Python
Source: Machine Learning Algorithms from Scratch
Training with k-fold Cross-Validation Approach
Reading