1 of 10

Practical Methodology for Deep Learning

André E. Lazzaretti

Universidade Tecnológica Federal do Paraná (UTFPR) - Curitiba

Pós-Graduação em Engenharia Elétrica e Informática Industrial (CPGEI)

Laboratório de Bioinformática e Inteligência Computacional (LABIC)

2 of 10

Introduction

Determine your goals - what error metric to use, and your target value for this error metric.
Establish a working end-to-end pipeline as soon as possible.
Instrumentation: Diagnose which components are performing worse than expected and whether poor performance is due to overﬁtting, underﬁtting, or a defect in the data or software.
Repeatedly make incremental changes such as gathering new data, adjusting hyperparameters, or changing algorithms, based on speciﬁc ﬁndings from your instrumentation.

Performance Metrics

Keep in mind that for most applications, it is impossible to achieve absolute zero error.
How can one determine a reasonable level of performance to expect?

Academic: we have some estimate of the error rate that is attainable based on previously published benchmark results.
Real-word: we have some idea of the error rate that is necessary for an application to be safe, cost-eﬀective, or appealing to consumers.

Default Baseline Models

Optimization algorithm: Adam.
Batch normalization can have a dramatic eﬀect on optimization performance, especially for convolutional networks and networks with sigmoidal nonlinearities.
Regularization: Dropout.
If your task is similar to another task that has been studied extensively, you will probably do well by ﬁrst copying the model that is already known to perform best on the previously studied task.

Determining Whether to Gather More Data

First, determine whether the performance on the training set is acceptable.
If performance on the training set is poor, the learning algorithm is not using the available training data:

Try increasing the size of the model by adding more layers/hidden units to each layer.
Try improving the learning algorithm, for example by tuning the learning rate.
If large/tuned models do not work well, then the problem may be the quality of the training data.

Automatic Hyperparameter Optimization Algorithms

Hyperparameter optimization: ﬁnd a value of the hyperparameters that optimizes an objective function, such as validation error, sometimes under constraints (such as a budget for training time, memory or recognition time).
Several approaches...

Grid Search

For each hyperparameter, the user selects a small ﬁnite set of values to explore.
Typically, a grid search involves picking values approximately on a logarithmic scale.
Grid search usually performs best when it is performed repeatedly.
Problem is that its computational cost grows exponentially with the number of hyperparameters.

Random Search

First we deﬁne a marginal distribution for each hyperparameter (e.g. uniform).
We should not discretize or bin the values of the hyperparameters.
The main reason that random search ﬁnds good solutions faster than grid search is that it has no wasted experimental runs.

Keras Example

Debugging Strategies

The bug may not be apparent just from examining the output of the model. Depending on the distribution of the input, the network may be able to adapt to compensate for particular bugs.
Typical approaches: