1 of 30

Decision Forest and Deep Neural Decision Forest

Christopher B. Choy (chrischoy@ai.stanford.edu)

2016/02/05 SDL-Reading Group Presentation

2 of 30

Table of Contents

  • Deicision Forest
    • Testing
    • Training
  • Deep Neural Decision Forest
    • Formulation
    • Training
    • Experiments
  • CNN vs Deep-NDF vs Decision Forest
  • Fully Differentiabl Deep Neural Decision Forest
    • Formulation
    • Training
    • Implementation
  • References

3 of 30

Decision Forest

4 of 30

Decision Forest

  • Classification, regression
  • Decision tree
    • Overfits easily
  • Ensemble Learning Method
    • Decision trees
    • Mean or mode of each predictions

5 of 30

Decision Forest: testing

Input:

Split function:

6 of 30

Decision Forest: training

  • for each tree
    • select a subset from the training set
    • make a root node
    • while a terminal node satisfies a branching condition (entropy)
      • make a decision boundary along an (rotated) axis that maximizes information gain (mutual information)
    • for each terminal (leaf) node
      • Discrete
      • Continuous: scalar, vector

7 of 30

  • Why entropy:
    • Measure of information/uncertainty
      • coin head tail: 1bit
      • Number of bits to save the event
    • Expected information/uncertainty of an even :
    • Entropy
  • Why mutual information (information gain)
    • Mutual information (information theory) = information gain (computer vision)
    • Information gain:
    • Easier to interpret using mutual information
    • Reduction in uncertainty after information from Y is given

8 of 30

9 of 30

Adding Randomness

  • Regularize overfitting
  • Training set sampling
  • Randomized node optimization
    • Random axis selection
  • Ensemble Model

10 of 30

11 of 30

Random Forest in Practice

  • Efficient human pose estimation from single depth images, CVPR 2012
    • Segmentation (classification), regression

12 of 30

Deep Neural Decision Forest

13 of 30

Deep Neural Decision Forest (Deep-NDF)

  • Random Forest
    • Multiclass/regression problem
    • Distributable
  • Neural Network
    • Representation Learning
    • End-to-end training using gradient backpropagation
    • Global convergence of positive homogeneous system [Haeffele et al.]
    • Modular, hardware accelearation
  • Instead of using weak classifier on raw data, learn features using backprop
    • Representation learning in the loop
  • Neural Decision Forests for Semantic Image Labelling, CVPR 2014
    • Features learned locally, independently
  • Deep Neural Decision Forests, ICCV 2015

14 of 30

Deep-NDF: NN + Forest

  • Decision Node
    • sigmoid FC output
    • p of going to the third leaf node

  • Leaf Node
    • Probability distribution

15 of 30

Deep-NDF

16 of 30

Deep-NDF: training

  • Decision nodes
    • Differentiable with respect to the network weights

  • Leaf nodes

  • Given all weights, closed form solution
  • Note that it is the sum over the entire dataset

17 of 30

Deep-NDF: training

  • Alternating optimization

18 of 30

Deep-NDF: experiments

  • GoogLeNet*: connect features from each softmax to higher softmax layers
  • Randomly select 500 dims from prev layer

19 of 30

GoogleNet*

20 of 30

GoogleNet*

10 trees

10 trees

10 trees

dNDF.NET

21 of 30

Caveat

  • 1k epochs using 100k mini-batch using 12 hosts with K40s
    • GoogleNet* batch size: 50
  • 10 trees (depth 15) at each softmax (total 3 softmaxs)
    • x 30 trees ~ 1.97m
    • 500 x 1k feature x 2m
  • Among 1k feature vector, randomly select 500 elements and generates f

  • 3 vs 30 softmax with random subset FC features + 100k minibatch 1k epoch

22 of 30

Convexity comparison

  • CNN
    • Softmax, cross entropy
    • convex
  • Deep-NDF
    • negative log of product of sigmoid, constant
    • sum of exponential familiy: convex
  • Convexity at the last loss layer matters
    • Positively homogeneous system with convex loss: global optima [Haeffele et al.]

23 of 30

Leaf Node

24 of 30

Comparison

CNN

Deep-NDF

Decision Forest

Feature Learning

Yes

Yes

No

Divide-and-conquer

No

Yes

Yes

Mutual Information Maximization

No

No

Yes

Ease of parallelization

Difficult

Difficult

Easy

Gradient Descent

Yes

Yes

No

Convexity of loss

Convex

Convex

N/A

25 of 30

Fully Differentiable Deep-NDF

26 of 30

Leaf Node

  • Probability
  • Alternative optimization
    • Drags behind
  • Learn using gradient
    • Simplex
    • positive, sum up to 1
    • Constrained optimization?
  • Parametrize probability

  • Final loss: -log (sigmoid * softmax): sum of convex functions: convex

27 of 30

Symbolic Math Library

  • Symbolic math library
    • Theano
    • autograd
    • TensorFlow
  • Tensorflow
    • Symbolic math
    • Fast compilation
    • Distributed computing

28 of 30

29 of 30

1

2

3

30 of 30

References

  • A. Criminisi, J. Shotton and E. Konukoglu, Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning, Microsoft Research technical report
  • Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulo, Deep Neural Decision Forests, ICCV 2015
  • Haeffele, B.D. and Vidal, R., Global Optimality in Tensor Factorization, Deep Learning, and Beyond, arXiv 2015
  • https://github.com/chrischoy/fully-differentiable-deep-ndf-tf