1 of 30

MAR 580: Models for Marine Ecosystem-Based Management

TMB workshop

Session I, why TMB?

(Acknowledgements: Mollie Brooks, Kasper Kristensen, Arni Magnusson, Anders Nielsen, André Punt)

06 September 2022

2 of 30

Schedule

gavinfay.github.io/mebm-models/tmb

Rstudio Cloud project

Aims of workshop:

* Introduce TMB

* Conduct several examples

3 of 30

Models in fisheries, fitting to data

  • Common exercise in fisheries and wildlife science is fitting a model to some data.
  • These models (and the data they are fit to) can be simple to more complicated.
    • e.g. regressions, stock-recruitment analysis, population dynamics models, etc.
  • Optimization tools in statistical software (e.g. lm() , glmer(), or nls() in R) can be used in a lot of instances.
  • Custom-built objective functions can also be helpful (e.g. using optimize() in R ).
  • Sometimes these tools can be slow or unstable when our models contain lots of parameters or datasets become large.

4 of 30

What is Template Model Builder?

Put simply, it is an R package to help conduct optimization (parameter estimation) of highly nonlinear models. It includes:

  • Tools for reading in most interesting data objects�(number, vector, matrix, arrays, ...)
  • Tools for specifying model parameters & constraints (unbounded, bounded, summing to zero, vectors of, ...)
  • Evaluating the marginal likelihood via Laplace Approximation
  • Calculation of derivatives of objective function using AD
  • Helpful functions
  • Linked workflow with R, allowing for minimization of objective function
  • Calculations for measures of uncertainty by:
    • estimating asymptotic variance-covariance matrices,
    • computing likelihood profiles,

5 of 30

TMB or AD Model Builder?

  • Many statistical stock assessment software packages in use today are written in ADMB,

e.g. ASAP, Stock Synthesis, CASAL, etc.

  • This is changing, TMB faster for random effects models:

e.g. SAM, WHAM,

  • Workflow similar: write a template file that contains model specification, and how data relate to the model.
  • Fast because the minimization algorithms use analytical derivatives and a quasi-Newton algorithm.
  • We need to specify the objective function, but T(AD)MB takes care of the minimization procedure.
  • No need to supply the derivatives of the function to be minimized with respect to the parameters; these are computed automatically using Reverse Mode Autodifferentiation (the ‘AD’).

6 of 30

Specifying the Problem – the TMB approach

  • TMB is an R package. Unlike ADMB, TMB workflow is embedded in R.
  • TMB programs are written using a template (a file with a .cpp extension).
  • The .cpp file specifies:
    • the data (constants) that are part of the function,
    • the parameters that are to be varied to minimize the function,
    • variables that depend on the parameters but are not parameters themselves; and
    • the function to be minimized.
  • The function itself is minimized from R.

7 of 30

Work environment – the TMB approach

  • Work within RStudio

Hint: make RStudio your default .cpp editor

Recommend running TMB:::setupRStudio()

  • Every TMB project consists of (at least):
    • an R script that controls the workflow
    • a .cpp file that specifies the model & objective function

8 of 30

8

Taken from https://github.com/James-Thorson/2016_Spatio-temporal_models

9 of 30

Work environment – the TMB approach

  • R script:
    • Create list objects for data & model parameters
    • Compile & link the C++ program
    • Load the compiled C++ code
    • Build model object (MakeADFun)
    • Fit model
    • Analyze results

10 of 30

First example: linear regression

11 of 30

Review

  • Every TMB model begins by including the header file

#include <TMB.hpp>

  • The objective function is then included in the following code:

template<class Type>

Type objective_function<Type>::operator() ()

{

//insert your model here

return neglogL:

}

12 of 30

Review

  • Comments in c++ are //
  • All variables must be declared before being used. No exceptions!
  • Expressions need to end with semicolons

13 of 30

Understanding MakeADFun arguments

  • data: the data to be passed to the function (as a list)
  • parameters: a list of estimable parameters (including random and fixed effects)
  • DLL: name of the DLL
  • map: provides a way to fix parameters�(to not estimate, or make parameters equal)
  • random: which variables are random effects
  • control: behaviour of fitting (tolerance, etc.)

14 of 30

MakeADFun’s map

  • Use the map argument to:
  • Fix parameters at their initial values:

e.g. map = list(b0=factor(NA)) #don’t estimate intercept

  • Mirror values within a vector

e.g. if you have a vector beta and you want elements 2 & 4 to be the same:

map = list(beta=factor(c(1,2,3,2)))

15 of 30

Reporting

Use REPORT to get results back into R

e.g.

Type sigma = exp(logSigma);

REPORT(sigma);

Access in R via:

model$report();

16 of 30

Variances for derived quantities

sdreport(model)

- asymptotic variances for parameters

To compute standard errors for derived quantities, define ADREPORT variable:

e.g.

Type sigma = exp(logSigma);

ADREPORT(sigma);

sdreport(model);

17 of 30

Variance information

  • Hessian matrix

model$he()

  • Variance-covariance matrix:

solve(model$he())

  • Correlation matrix:

cov2cor(solve(model$he()))

  • Standard errors:

sqrt(diag(solve(model$he()))))

18 of 30

Data Types

TMB needs to know what dimensions and format your data/variables/parameters will be in.

  • The most-basic types:
    • DATA_INTEGER (e.g. int Count) – integer.
    • DATA_IVECTOR – vector of integers.
    • DATA_IMATRIX – matrix of integers.
    • DATA_IARRAY – array of integers.
    • DATA_SCALAR (e.g. number) – real.
    • DATA_VECTOR – vector of reals.
    • DATA_MATRIX – matrix of reals.
    • DATA_ARRAY – array of reals.
    • DATA_STRUC – A structure.
    • DATA_STRING – A string.

  • All variables MUST be declared before they are used – no exemptions!

19 of 30

Parameter Types

PARAMETER(par_name);

PARAMETER_VECTOR(par_name);

Local Variables

Type p; [scalar]

vector<Type> q(5); [vector]

matrix<Type> z(5,5); [matrix]

array<Type>k(5,5,5); [array]

int I; [integer]

vector<int>jj(5); [integer vector]

20 of 30

Variable Names

  • Must start with an alphabetic character.
  • Don’t use any reserved words (if, else, etc.)
  • Choose descriptive but not overly long variable names (e.g. biomass, MSY).
  • TMB (and C) is case-sensitive, i.e. the variables biomass and Biomass are NOT the same variable.

Other rules / hints:

  • Use underscores to split names within a variable name (e.g. my_biomass).
  • Avoid re-using the same variable for different purposes.
  • Don’t forget those semi-colons!

21 of 30

http://kaskr.github.io/adcomp/_book/CppTutorial.html#i-know-r-but-not-c

22 of 30

Programming Style

Comment, comment, comment:

  • Include comments (indicated by “//”) at the start of the program that states what it does.
  • Split ideas / blocks of code with blank lines and comments.
  • Use “inline” comments to refer to equations and meanings of variables.
  • Indent blocks of code to increase clarity.
  • Include comments in your R scripts

(indicated by “#”).

23 of 30

Getting Help, and debugging hints

  • Arrays start at 0 rather than 1, this can confuse R users!
  • TMB wiki (https://github.com/kaskr/adcomp/wiki)
  • TMB book (http://kaskr.github.io/adcomp/_book/Introduction.html)
  • Start simple: develop a routine, test it, develop the next routine, …
  • Use “std::cout” to output variables to the screen so you can check that the function is being calculated correctly.
  • Test code by writing out intermediate results (you can then check that values output are correct given the values for the parameters – the first call of the function will involve the parameters being set to their initial values).
  • Cross compare: Write critical bits of code in EXCEL or R and check you get the correct values.
  • Never, ever, ever, believe your code is correct: always work on the assumption there is an error somewhere.

24 of 30

Linear modeling review LMs, GLMs, NLMs

  • LM and GLM are used to describe whether and how a response variable depends on a combination of predictors.
  • Predictors can be numeric (continuous) or factors (categorical).

  • Linear

Quite flexible as it is

  • Generalized

Any error distribution

  • Nonlinear

Any relationship between Y and X

closed�form�solution

iterative�solution

25 of 30

26 of 30

27 of 30

28 of 30

29 of 30

Example: Poisson GLM

  • Switch to MLE lecture notes

30 of 30

Exercise: Poisson GLM with covariate

  • Plot the residuals of the herring count model with respect to month
  • Extend the herring count TMB model to include a month effect
  • i.e. Fit the model:

Y(i) ~ Poisson(λ(i))

ln(λ(i)) = βmonth(i)

  • Plot the resulting residuals
  • Compare the model with the month effect to the single-intercept model using AIC.