1 of 30

Decision Forest and Deep Neural Decision Forest

Christopher B. Choy (chrischoy@ai.stanford.edu)

2016/02/05 SDL-Reading Group Presentation

2 of 30

Table of Contents

Deicision Forest

Testing
Training

Deep Neural Decision Forest

Formulation
Training
Experiments

CNN vs Deep-NDF vs Decision Forest
Fully Differentiabl Deep Neural Decision Forest

Formulation
Training
Implementation

References

3 of 30

Decision Forest

4 of 30

Decision Forest

Classification, regression
Decision tree

Overfits easily

Ensemble Learning Method

Decision trees
Mean or mode of each predictions

5 of 30

Decision Forest: testing

Input:

Split function:

6 of 30

Decision Forest: training

for each tree

select a subset from the training set
make a root node
while a terminal node satisfies a branching condition (entropy)

make a decision boundary along an (rotated) axis that maximizes information gain (mutual information)

for each terminal (leaf) node

Discrete
Continuous: scalar, vector

7 of 30

Why entropy:

Measure of information/uncertainty

coin head tail: 1bit
Number of bits to save the event

Expected information/uncertainty of an even :
Entropy

Why mutual information (information gain)

Mutual information (information theory) = information gain (computer vision)
Information gain:
Easier to interpret using mutual information
Reduction in uncertainty after information from Y is given

8 of 30

9 of 30

Adding Randomness

Regularize overfitting
Training set sampling
Randomized node optimization

Random axis selection

Ensemble Model

10 of 30

11 of 30

Random Forest in Practice

Efficient human pose estimation from single depth images, CVPR 2012

Segmentation (classification), regression

https://github.com/kjw0612/awesome-random-forest#applications

12 of 30

Deep Neural Decision Forest

13 of 30

Deep Neural Decision Forest (Deep-NDF)

Random Forest

Multiclass/regression problem
Distributable

Neural Network

Representation Learning
End-to-end training using gradient backpropagation
Global convergence of positive homogeneous system [Haeffele et al.]
Modular, hardware accelearation

Instead of using weak classifier on raw data, learn features using backprop

Representation learning in the loop

Neural Decision Forests for Semantic Image Labelling, CVPR 2014

Features learned locally, independently

Deep Neural Decision Forests, ICCV 2015

14 of 30

Deep-NDF: NN + Forest

Decision Node

sigmoid FC output
p of going to the third leaf node

Leaf Node

Probability distribution

15 of 30

Deep-NDF

16 of 30

Deep-NDF: training

Decision nodes

Differentiable with respect to the network weights

Leaf nodes

Given all weights, closed form solution
Note that it is the sum over the entire dataset

17 of 30

Deep-NDF: training

Alternating optimization

18 of 30

Deep-NDF: experiments

GoogLeNet*: connect features from each softmax to higher softmax layers
Randomly select 500 dims from prev layer

19 of 30

GoogleNet*

20 of 30

GoogleNet*

10 trees

dNDF.NET

21 of 30

Caveat

1k epochs using 100k mini-batch using 12 hosts with K40s

GoogleNet* batch size: 50

10 trees (depth 15) at each softmax (total 3 softmaxs)

x 30 trees ~ 1.97m
500 x 1k feature x 2m

Among 1k feature vector, randomly select 500 elements and generates f

3 vs 30 softmax with random subset FC features + 100k minibatch 1k epoch

22 of 30

Convexity comparison

CNN

Softmax, cross entropy
convex

Deep-NDF

negative log of product of sigmoid, constant
sum of exponential familiy: convex

Convexity at the last loss layer matters

Positively homogeneous system with convex loss: global optima [Haeffele et al.]

23 of 30

Leaf Node

24 of 30

Comparison

	CNN	Deep-NDF	Decision Forest
Feature Learning	Yes	Yes	No
Divide-and-conquer	No	Yes	Yes
Mutual Information Maximization	No	No	Yes
Ease of parallelization	Difficult	Difficult	Easy
Gradient Descent	Yes	Yes	No
Convexity of loss	Convex	Convex	N/A

25 of 30

Fully Differentiable Deep-NDF

26 of 30

Leaf Node

Probability
Alternative optimization

Drags behind

Learn using gradient

Simplex
positive, sum up to 1
Constrained optimization?

Parametrize probability

Final loss: -log (sigmoid * softmax): sum of convex functions: convex

27 of 30

Symbolic Math Library

Symbolic math library

Theano
autograd
TensorFlow

Tensorflow

Symbolic math
Fast compilation
Distributed computing

28 of 30

29 of 30

1

2

3

https://github.com/chrischoy/fully-differentiable-deep-ndf-tf/blob/github/demo_fully_diff_ndf.py

30 of 30

References

A. Criminisi, J. Shotton and E. Konukoglu, Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning, Microsoft Research technical report
Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulo, Deep Neural Decision Forests, ICCV 2015
Haeffele, B.D. and Vidal, R., Global Optimality in Tensor Factorization, Deep Learning, and Beyond, arXiv 2015
https://github.com/chrischoy/fully-differentiable-deep-ndf-tf