1 of 10

Practical Methodology for Deep Learning

André E. Lazzaretti

Universidade Tecnológica Federal do Paraná (UTFPR) - Curitiba

Pós-Graduação em Engenharia Elétrica e Informática Industrial (CPGEI)

Laboratório de Bioinformática e Inteligência Computacional (LABIC)

2 of 10

Introduction

  • Determine your goals - what error metric to use, and your target value for this error metric.
  • Establish a working end-to-end pipeline as soon as possible.
  • Instrumentation: Diagnose which components are performing worse than expected and whether poor performance is due to overfitting, underfitting, or a defect in the data or software.
  • Repeatedly make incremental changes such as gathering new data, adjusting hyperparameters, or changing algorithms, based on specific findings from your instrumentation.

3 of 10

Performance Metrics

  • Keep in mind that for most applications, it is impossible to achieve absolute zero error.
  • How can one determine a reasonable level of performance to expect?
    • Academic: we have some estimate of the error rate that is attainable based on previously published benchmark results.
    • Real-word: we have some idea of the error rate that is necessary for an application to be safe, cost-effective, or appealing to consumers.

4 of 10

Default Baseline Models

  • Optimization algorithm: Adam.
  • Batch normalization can have a dramatic effect on optimization performance, especially for convolutional networks and networks with sigmoidal nonlinearities.
  • Regularization: Dropout.
  • If your task is similar to another task that has been studied extensively, you will probably do well by first copying the model that is already known to perform best on the previously studied task.

5 of 10

Determining Whether to Gather More Data

  • First, determine whether the performance on the training set is acceptable.
  • If performance on the training set is poor, the learning algorithm is not using the available training data:
    • Try increasing the size of the model by adding more layers/hidden units to each layer.
    • Try improving the learning algorithm, for example by tuning the learning rate.
    • If large/tuned models do not work well, then the problem may be the quality of the training data.
  • If test set performance is much worse than training set performance:
    • Gathering more data may help.
    • Reduce the size of the model or improve regularization.
  • If the performance on the training and test sets are acceptable - Done!

6 of 10

Automatic Hyperparameter Optimization Algorithms

  • Hyperparameter optimization: find a value of the hyperparameters that optimizes an objective function, such as validation error, sometimes under constraints (such as a budget for training time, memory or recognition time).
  • Several approaches...

7 of 10

Grid Search

  • For each hyperparameter, the user selects a small finite set of values to explore.
  • Typically, a grid search involves picking values approximately on a logarithmic scale.
  • Grid search usually performs best when it is performed repeatedly.
  • Problem is that its computational cost grows exponentially with the number of hyperparameters.

8 of 10

Random Search

  • First we define a marginal distribution for each hyperparameter (e.g. uniform).
  • We should not discretize or bin the values of the hyperparameters.
  • The main reason that random search finds good solutions faster than grid search is that it has no wasted experimental runs.

9 of 10

Keras Example

10 of 10

Debugging Strategies

  • The bug may not be apparent just from examining the output of the model. Depending on the distribution of the input, the network may be able to adapt to compensate for particular bugs.
  • Typical approaches:
    • Visualize the model in action
    • Visualize the worst mistakes
    • Reason about software using training and test error
    • Fit a tiny dataset
    • Compare back-propagated derivatives to numerical derivatives
    • Monitor histograms of activations and gradient