Decision Forest and Deep Neural Decision Forest

Christopher B. Choy (chrischoy@ai.stanford.edu)

2016/02/05 SDL-Reading Group Presentation

Table of Contents

  • Deicision Forest
    • Testing
    • Training
  • Deep Neural Decision Forest
    • Formulation
    • Training
    • Experiments
  • CNN vs Deep-NDF vs Decision Forest
  • Fully Differentiabl Deep Neural Decision Forest
    • Formulation
    • Training
    • Implementation
  • References

Decision Forest

Decision Forest

  • Classification, regression
  • Decision tree
    • Overfits easily
  • Ensemble Learning Method
    • Decision trees
    • Mean or mode of each predictions

Decision Forest: testing

Input:

Split function:

Given a previously unseen data point v a decision tree hierarchically applies split functions. Starting at the root, each split node applies its associated split function to v.

Depending on the result of the binary test the data is sent to the right or left child. This process is repeated until the data point reaches a leaf node.

leaf nodes contain a predictor (e.g. a classifier, or a regressor) which associates an input v to the output (class label)

Decision Forest: training

  • for each tree
    • select a subset from the training set
    • make a root node
    • while a terminal node satisfies a branching condition (entropy)
      • make a decision boundary along an (rotated) axis that maximizes information gain (mutual information)
    • for each terminal (leaf) node
      • Discrete
      • Continuous: scalar, vector

  • Why entropy:
    • Measure of information/uncertainty
      • coin head tail: 1bit
      • Number of bits to save the event
    • Expected information/uncertainty of an even :
    • Entropy
  • Why mutual information (information gain)
    • Mutual information (information theory) = information gain (computer vision)
    • Information gain:
    • Easier to interpret using mutual information
    • Reduction in uncertainty after information from Y is given

Adding Randomness

  • Regularize overfitting
  • Training set sampling
  • Randomized node optimization
    • Random axis selection
  • Ensemble Model

Random Forest in Practice

  • Efficient human pose estimation from single depth images, CVPR 2012
    • Segmentation (classification), regression

Deep Neural Decision Forest

Deep Neural Decision Forest (Deep-NDF)

  • Random Forest
    • Multiclass/regression problem
    • Distributable
  • Neural Network
    • Representation Learning
    • End-to-end training using gradient backpropagation
    • Global convergence of positive homogeneous system [Haeffele et al.]
    • Modular, hardware accelearation
  • Instead of using weak classifier on raw data, learn features using backprop
    • Representation learning in the loop
  • Neural Decision Forests for Semantic Image Labelling, CVPR 2014
    • Features learned locally, independently
  • Deep Neural Decision Forests, ICCV 2015

Deep-NDF: NN + Forest

  • Decision Node
    • sigmoid FC output
    • p of going to the third leaf node

  • Leaf Node
    • Probability distribution

Deep-NDF

Deep-NDF: training

  • Decision nodes
    • Differentiable with respect to the network weights

  • Leaf nodes

  • Given all weights, closed form solution
  • Note that it is the sum over the entire dataset

Deep-NDF: training

  • Alternating optimization

Deep-NDF: experiments

  • GoogLeNet*: connect features from each softmax to higher softmax layers
  • Randomly select 500 dims from prev layer

GoogleNet*

GoogleNet*

10 trees

10 trees

10 trees

dNDF.NET

Caveat

  • 1k epochs using 100k mini-batch using 12 hosts with K40s
    • GoogleNet* batch size: 50
  • 10 trees (depth 15) at each softmax (total 3 softmaxs)
    • x 30 trees ~ 1.97m
    • 500 x 1k feature x 2m
  • Among 1k feature vector, randomly select 500 elements and generates f

  • 3 vs 30 softmax with random subset FC features + 100k minibatch 1k epoch

Convexity comparison

  • CNN
    • Softmax, cross entropy
    • convex
  • Deep-NDF
    • negative log of product of sigmoid, constant
    • sum of exponential familiy: convex
  • Convexity at the last loss layer matters
    • Positively homogeneous system with convex loss: global optima [Haeffele et al.]

Leaf Node

Less uncertainty as training progresses

Comparison

CNN

Deep-NDF

Decision Forest

Feature Learning

Yes

Yes

No

Divide-and-conquer

No

Yes

Yes

Mutual Information Maximization

No

No

Yes

Ease of parallelization

Difficult

Difficult

Easy

Gradient Descent

Yes

Yes

No

Convexity of loss

Convex

Convex

N/A

Fully Differentiable Deep-NDF

Leaf Node

  • Probability
  • Alternative optimization
    • Drags behind
  • Learn using gradient
    • Simplex
    • positive, sum up to 1
    • Constrained optimization?
  • Parametrize probability

  • Final loss: -log (sigmoid * softmax): sum of convex functions: convex

Symbolic Math Library

  • Symbolic math library
    • Theano
    • autograd
    • TensorFlow
  • Tensorflow
    • Symbolic math
    • Fast compilation
    • Distributed computing

1

2

3

References

  • A. Criminisi, J. Shotton and E. Konukoglu, Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning, Microsoft Research technical report
  • Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulo, Deep Neural Decision Forests, ICCV 2015
  • Haeffele, B.D. and Vidal, R., Global Optimality in Tensor Factorization, Deep Learning, and Beyond, arXiv 2015
  • https://github.com/chrischoy/fully-differentiable-deep-ndf-tf

Decision Forest and Deep Neural Decision Forest - Google Slides