1 of 30

MAR 580: Models for Marine Ecosystem-Based Management

TMB workshop

Session I, why TMB?

(Acknowledgements: Mollie Brooks, Kasper Kristensen, Arni Magnusson, Anders Nielsen, André Punt)

06 September 2022

2 of 30

Schedule

gavinfay.github.io/mebm-models/tmb�

Rstudio Cloud project

Aims of workshop:

* Introduce TMB

* Conduct several examples

3 of 30

Models in fisheries, fitting to data

Common exercise in fisheries and wildlife science is fitting a model to some data.
These models (and the data they are fit to) can be simple to more complicated.

e.g. regressions, stock-recruitment analysis, population dynamics models, etc.

Optimization tools in statistical software (e.g. lm() , glmer(), or nls() in R) can be used in a lot of instances.
Custom-built objective functions can also be helpful (e.g. using optimize() in R ).
Sometimes these tools can be slow or unstable when our models contain lots of parameters or datasets become large.

4 of 30

What is Template Model Builder?

Put simply, it is an R package to help conduct optimization (parameter estimation) of highly nonlinear models. It includes:

Tools for reading in most interesting data objects�(number, vector, matrix, arrays, ...)
Tools for specifying model parameters & constraints (unbounded, bounded, summing to zero, vectors of, ...)
Evaluating the marginal likelihood via Laplace Approximation
Calculation of derivatives of objective function using AD
Helpful functions
Linked workflow with R, allowing for minimization of objective function
Calculations for measures of uncertainty by:

estimating asymptotic variance-covariance matrices,
computing likelihood profiles,

5 of 30

TMB or AD Model Builder?

Many statistical stock assessment software packages in use today are written in ADMB,

e.g. ASAP, Stock Synthesis, CASAL, etc.

This is changing, TMB faster for random effects models:

e.g. SAM, WHAM,

Workflow similar: write a template file that contains model specification, and how data relate to the model.
Fast because the minimization algorithms use analytical derivatives and a quasi-Newton algorithm.
We need to specify the objective function, but T(AD)MB takes care of the minimization procedure.
No need to supply the derivatives of the function to be minimized with respect to the parameters; these are computed automatically using Reverse Mode Autodifferentiation (the ‘AD’).

6 of 30

Specifying the Problem – the TMB approach

TMB is an R package. Unlike ADMB, TMB workflow is embedded in R.
TMB programs are written using a template (a file with a .cpp extension).
The .cpp file specifies:

the data (constants) that are part of the function,
the parameters that are to be varied to minimize the function,
variables that depend on the parameters but are not parameters themselves; and
the function to be minimized.

The function itself is minimized from R.

7 of 30

Work environment – the TMB approach

Work within RStudio

Hint: make RStudio your default .cpp editor

Recommend running TMB:::setupRStudio()

Every TMB project consists of (at least):

an R script that controls the workflow
a .cpp file that specifies the model & objective function

8 of 30

Taken from https://github.com/James-Thorson/2016_Spatio-temporal_models

9 of 30

Work environment – the TMB approach

R script:

Create list objects for data & model parameters
Compile & link the C++ program
Load the compiled C++ code
Build model object (MakeADFun)
Fit model
Analyze results

10 of 30

First example: linear regression

11 of 30

Review

Every TMB model begins by including the header file

#include <TMB.hpp>

The objective function is then included in the following code:

template<class Type>

Type objective_function<Type>::operator() ()

{

//insert your model here

return neglogL:

}

12 of 30

Review

Comments in c++ are //
All variables must be declared before being used. No exceptions!
Expressions need to end with semicolons

13 of 30

Understanding MakeADFun arguments

data: the data to be passed to the function (as a list)
parameters: a list of estimable parameters (including random and fixed effects)
DLL: name of the DLL
map: provides a way to fix parameters�(to not estimate, or make parameters equal)
random: which variables are random effects
control: behaviour of fitting (tolerance, etc.)

14 of 30

MakeADFun’s map

Use the map argument to:
Fix parameters at their initial values:

e.g. map = list(b0=factor(NA)) #don’t estimate intercept

Mirror values within a vector

e.g. if you have a vector beta and you want elements 2 & 4 to be the same:

map = list(beta=factor(c(1,2,3,2)))

15 of 30

Reporting

Use REPORT to get results back into R

e.g.

Type sigma = exp(logSigma);

REPORT(sigma);

Access in R via:

model$report();

16 of 30

Variances for derived quantities

sdreport(model)

- asymptotic variances for parameters

To compute standard errors for derived quantities, define ADREPORT variable:

e.g.

Type sigma = exp(logSigma);

ADREPORT(sigma);

sdreport(model);

17 of 30

Variance information

Hessian matrix

model$he()

Variance-covariance matrix:

solve(model$he())

Correlation matrix:

cov2cor(solve(model$he()))

Standard errors:

sqrt(diag(solve(model$he()))))

18 of 30

Data Types

TMB needs to know what dimensions and format your data/variables/parameters will be in.

The most-basic types:

DATA_INTEGER (e.g. int Count) – integer.
DATA_IVECTOR – vector of integers.
DATA_IMATRIX – matrix of integers.
DATA_IARRAY – array of integers.
DATA_SCALAR (e.g. number) – real.
DATA_VECTOR – vector of reals.
DATA_MATRIX – matrix of reals.
DATA_ARRAY – array of reals.
DATA_STRUC – A structure.
DATA_STRING – A string.

All variables MUST be declared before they are used – no exemptions!

19 of 30

Parameter Types

PARAMETER(par_name);

PARAMETER_VECTOR(par_name);

Local Variables

Type p; [scalar]

vector<Type> q(5); [vector]

matrix<Type> z(5,5); [matrix]

array<Type>k(5,5,5); [array]

int I; [integer]

vector<int>jj(5); [integer vector]

20 of 30

Variable Names

Must start with an alphabetic character.
Don’t use any reserved words (if, else, etc.)
Choose descriptive but not overly long variable names (e.g. biomass, MSY).
TMB (and C) is case-sensitive, i.e. the variables biomass and Biomass are NOT the same variable.

Other rules / hints:

Use underscores to split names within a variable name (e.g. my_biomass).
Avoid re-using the same variable for different purposes.
Don’t forget those semi-colons!

21 of 30

http://kaskr.github.io/adcomp/_book/CppTutorial.html#i-know-r-but-not-c

22 of 30

Programming Style

Comment, comment, comment:

Include comments (indicated by “//”) at the start of the program that states what it does.
Split ideas / blocks of code with blank lines and comments.
Use “inline” comments to refer to equations and meanings of variables.
Indent blocks of code to increase clarity.
Include comments in your R scripts

(indicated by “#”).

23 of 30

Getting Help, and debugging hints

Arrays start at 0 rather than 1, this can confuse R users!
TMB wiki (https://github.com/kaskr/adcomp/wiki)
TMB book (http://kaskr.github.io/adcomp/_book/Introduction.html)
Start simple: develop a routine, test it, develop the next routine, …
Use “std::cout” to output variables to the screen so you can check that the function is being calculated correctly.
Test code by writing out intermediate results (you can then check that values output are correct given the values for the parameters – the first call of the function will involve the parameters being set to their initial values).
Cross compare: Write critical bits of code in EXCEL or R and check you get the correct values.
Never, ever, ever, believe your code is correct: always work on the assumption there is an error somewhere.

24 of 30

Linear modeling review LMs, GLMs, NLMs

LM and GLM are used to describe whether and how a response variable depends on a combination of predictors.
Predictors can be numeric (continuous) or factors (categorical).

Linear

Quite flexible as it is

Generalized

Any error distribution

Nonlinear

Any relationship between Y and X

closed�form�solution

iterative�solution

29 of 30

Example: Poisson GLM

Switch to MLE lecture notes

30 of 30

Exercise: Poisson GLM with covariate

Plot the residuals of the herring count model with respect to month
Extend the herring count TMB model to include a month effect
i.e. Fit the model:

Y(i) ~ Poisson(λ(i))

ln(λ(i)) = β_month(i)

Plot the resulting residuals
Compare the model with the month effect to the single-intercept model using AIC.