An Investigation and Predictive Analysis of Tree Cover Types in Roosevelt National Forest
Jacob Swe, Iva Porfirova, Lee Ding, Derek Huang
The Plan
Forest Conservation
Cartographical measurements of distances and orientations of the different forest cover types - total of 10.
Elevation
Aspect
Slope
Horizontal_Distance_To_Hydrology
Vertical_Distance_To_Hydrology
Horizontal_Distance_To_Roadways
Hillshade_9am
Hillshade_Noon
Hillshade_3pm
Horizontal_Distance_To_Fire_Points
Continuous Variables
Continuous Variables
Continuous Variables
Continuous Variables
Different categories for wilderness areas and soil types
Wilderness_Area1 (Rawah)
Wilderness_Area2 (Neota)
Wilderness_Area3 (Comanche)
Wilderness_Area4 (Cache la Poudre)
Soil_Type1
…
Soil_Type40
Binary Variables
0 = absence
1 = presence
Binary Variables
Binary Variables
Visual Analysis
Clustering
Our Approach to Modeling
Data Partitioning:
3-fold CV
Hyperparameter Tuning:
CV Grid Search
Resampling:
Rebalance Class Membership
Our Approach to Modeling
Feature Selection:
CV Recursive Feature Elimination
Our Approach to Modeling
Linear Discriminant Analysis
LDA:
Regularization: 0.3,
Tolerance: 0.01
3-fold CV Accuracy: 0.6597
Holdout Accuracy: 0.6612
Decision Tree Classifier
Decision Tree Classifier
LDA:
Regularization: 0.3,
Tolerance: 0.01
Holdout Accuracy: 0.6612
DTC:
Optimizer: Entropy,
Depth: 25,
Min Samples in Leaf: 5
Holdout Accuracy: 0.92190
Random Forest Classifier
Random Forest Classifier
RFC:
Optimizer: Entropy,
Depth: 15,
Min Samples in Leaf: 10
# Trees: 15
Holdout Accuracy: 0.82850
LDA:
Regularization: 0.3,
Tolerance: 0.01
Holdout Accuracy: 0.6612
DTC:
Optimizer: Entropy,
Depth: 25,
Min Samples in Leaf: 5
Holdout Accuracy: 0.92190
Comparative Performance
DTC:
Optimizer: Entropy,
Depth: 25,
Min Samples in Leaf: 5
Holdout Accuracy: 0.92190
RFC:
Optimizer: Entropy,
Depth: 25,
Min Samples in Leaf: 5
# Trees: 15
Holdout Accuracy: 0.94908
Best Model Selection
One vs. Many
Best Model Selection
One vs. Many
Resampling
Interpolating New
Observations
Resampling
~ 500,000 Samples
~ 1,500,000 Samples
Changing the Goal
What does resampling do to accuracy calculations?
What metrics should we pay attention to?
How can we decide if the new model is better?
Tuning a New Model
Accuracy now means Precision
Evaluating Performance
Evaluating Performance
Resampled
Raw Data
Accuracy:
~93% ➝ ~93%
ROC AUC:
~99% ➝ ~99%
Precision (Macro):
~96% ➝ ~97%
Precision-Recall AUC:
~96%➝ ~98%
Final Conclusions
Questions
&
Answers