Introduction

This vignette focus on the use of the calibrate() for parameter estimation of Ordinary Differential Equations (EDO) systems. We suggest to see the vignette ‘Getting started with the calibrar package’ before reading this one, specially if you do not have previous experience doing optimization in R. Reading the vignette ‘Using the calibrate() function for parameter estimation’ may be also very useful, as not every detail is reproduced here.

Fitting an EDO system: the Lotka-Volterra Predator-Prey model

We will estimate the parameters for a predator-prey Lotka-Volterra model using the calibrate() function. The model is defined by a system of ordinary differential equations for the abundance of prey \(N\) and predator \(P\): \[\frac{dN}{dt} = rN(1-N/K)-\alpha NP\] \[\frac{dP}{dt} = -lP + \gamma\alpha NP\] The parameters to estimate are the prey’s growth rate \(r\), the predator’s mortality rate \(l\), the carrying capacity of the prey \(K\) and \(\alpha\) and \(\gamma\) for the predation interaction. To start, we created the demonstration data for this model using the function calibrar_demo() function with T=100 as an additional argument to specify the time horizon.

library(calibrar)
set.seed(880820)
path = NULL # NULL to use the current directory
LV = calibrar_demo(path=path, model='PredatorPrey', T=100) 
## Creating observed data list for calibration...
## Loaded observed data for variables: 'prey', 'predator'.
setup = calibration_setup(file = LV$setup)
observed = calibration_data(setup=setup, path=LV$path)
## Creating observed data list for calibration...
## 
## Loaded observed data for variables: 'prey', 'predator'.
run_model = calibrar:::.PredatorPreyModel
coef(LV)
## $r
## [1] 0.5
## 
## $l
## [1] 0.2
## 
## $K
## [1] 100
## 
## $alpha
## [1] 0.1
## 
## $gamma
## [1] 0.1
## 
## $initial
## $initial$N
## [1] 10
## 
## $initial$P
## [1] 1

The run_model will simulate the data, by solving the ODE system defined by the Lotka-Volterra model:

run_model = function(par, T) {
  if(!requireNamespace("deSolve", quietly = TRUE)) 
    stop("You need to install the 'deSolve' package.")
  # par is a list with 'alpha', 'beta' 'gamma', 'sd' and 'mu_ini'.
  LV = function(t, y, parms, ...) {
    r = parms$r
    l = parms$l
    alpha = parms$alpha
    gamma = parms$gamma
    K = parms$K
    dN = r*y[1]*(1-(y[1]/K)) - alpha*y[1]*y[2]
    dP = -l*y[2] + gamma*alpha*y[1]*y[2]
    return(list(c(dN, dP)))
  }
  times = seq(0, T)
  y0 = c(par$initial$N, par$initial$P)
  sol = deSolve::ode(y=y0, times=times, func=LV, parms=par, method="ode45")
  out = as.list(as.data.frame(sol[,-1]))
  names(out) = c("prey", "predator")
  out$prey[is.na(out$prey)] = 0
  out$predator[is.na(out$predator)] = 0
  return(out)
}

The core of the run_model function relies in solving the ODE system by using the ode() function of the deSolve package. In general, the run_model takes a par argument (that can be a vector or a list) and produce a named list with the simulated data. All the intermediate code solves the simulation problem of taking a set of parameters to produce numerical outputs for each simulated variable. We will also define the objective function, based on the setup created by the demo:

variable type calibrate weight use_data file varid col_skip nrows
prey lnorm2 TRUE 1 TRUE data/prey.csv prey 2 NA
predator lnorm2 TRUE 1 TRUE data/predator.csv predator 2 NA

The calibration_objFn() will automatically create the objective function following the information in the setup, in this case, by using a log-normal distribution for the errors (type) and the same weight for both. The data is expected to be read from the files in the file column, files that were created when calling the demo.

# objective functions
obj = calibration_objFn(model=run_model, setup=setup, observed=observed, T=LV$T, aggregate=TRUE)

Now we can fit the model using several optimization methods:

lbfgsb1 = calibrate(par=LV$guess, fn=obj, method='L-BFGS-B', lower=LV$lower, upper=LV$upper, phases=LV$phase)
## Using optimization method 'L-BFGS-B'.
## Elapsed time: 3.91s
## Function value: 5.60587e-07
## Parameter values: 0.5 0.2 100 0.1 0.1
## 
## Status: CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH
lbfgsb2 = calibrate(par=LV$guess, fn=obj, method="Rvmmin", lower=LV$lower, upper=LV$upper, phases=LV$phase)
## Using optimization method 'Rvmmin'.
## Elapsed time: 5.07s
## Function value: 5.60575e-07
## Parameter values: 0.5 0.2 100 0.1 0.1
## 
## Status: Rvmminb appears to have converged
ahr = calibrate(par=LV$guess, fn=obj, method='AHR-ES', lower=LV$lower, upper=LV$upper, phases=LV$phase)
## Using optimization method 'AHR-ES'.
## Elapsed time: 4m 4.3s
## Function value: 5.60575e-07
## Parameter values: 0.5 0.2 100 0.1 0.1
## 
## Status: Stopping criteria reached in 4909 generations.
nm = calibrate(par=LV$guess, fn=obj, method="Nelder-Mead", phases=LV$phase)
## Using optimization method 'Nelder-Mead'.
## Elapsed time: 45.54s
## Function value: 1.41517
## Parameter values: 0.45 0.275 66.2 0.0788 0.183
## 
## Status: -

And compare them:

summary(LV, lbfgsb1, lbfgsb2, ahr, nm, show_par = 1:5)
##              method elapsed    value    fn gr    r     l     K  alpha gamma
## LV             data      NA 4.96e-07    NA NA 0.50 0.200 100.0 0.1000 0.100
## lbfgsb1    L-BFGS-B    3.91 5.61e-07    74 74 0.50 0.200 100.0 0.1000 0.100
## lbfgsb2      Rvmmin    5.07 5.61e-07   132 83 0.50 0.200 100.0 0.1000 0.100
## ahr          AHR-ES  244.35 5.61e-07 39272  0 0.50 0.200 100.0 0.1000 0.100
## nm      Nelder-Mead   45.54 1.42e+00   502 NA 0.45 0.275  66.2 0.0788 0.183

When a function is created with the calibration_objFn() it gains a predict() method, that can be used to simulate the model with the estimated set of parameters.

lbfgsb1.pred = predict(lbfgsb1)
lbfgsb2.pred = predict(lbfgsb2)
ahr.pred     = predict(ahr)
nm.pred      = predict(nm)

and plot the results.

methods = c("data", "L-BFGS-B", "AHR-ES", "Nelder-Mead")
par(mfrow=c(1,2), mar=c(4,4,1,1),
    oma=c(1,1,1,1))
plot(observed$prey, cex=0.75,
     ylab="prey abundance (N)", xlab="time", las=1,
     ylim=c(0,55))
lines(lbfgsb1.pred$prey, col=1, lwd=4)
lines(ahr.pred$prey, col=2, lwd=2)
lines(nm.pred$prey, col=3, lwd=2)

plot(observed$predator, cex=0.75, 
     ylab="predator abundance (P)", xlab="time", las=1,
     ylim=c(0,7))
lines(lbfgsb1.pred$predator, col=1, lwd=4)
lines(ahr.pred$predator, col=2, lwd=2)
lines(nm.pred$predator, col=3, lwd=2)

legend(100, 1.8, legend=methods, bty="n", cex=0.75, y.intersp=0.8, 
       inset=-0.0, xjust=1, pch = c(1, rep(NA,5)), lty=c(0, rep(1,5)), 
       col=c(1,1:3), lwd=2)

In this example, the ‘L-BFGS-B’ and ‘AHR-ES’ algorithms were able to estimate the original parameter values, but the ‘Nelder-Mead’ algorithm were not.

Fitting an EDO system: the SIR epidemiological model

As a second example, we will estimate the parameters for a SIR epidemiological model. The model is defined by a system of ordinary differential equations for the number of susceptible \(S\), infected \(I\) and recovered \(R\) individuals: \[\frac{dS}{dt} = -\beta S I/N\] \[\frac{dI}{dt} = \beta S I/N -\gamma I\] \[\frac{dR}{dt} = \gamma I\]

The parameters to estimate are the average number of contacts per person per time \(\beta\) and the instant probability of an infectious individual recovering \(\gamma\). To start, we created the demonstration data for this model using the function calibrar_demo() function with T=100 as an additional argument to specify the time horizon.

path = NULL # NULL to use the current directory
SIR = calibrar_demo(path=path, model='SIR', T=100) 
## Creating observed data list for calibration...
## Loaded observed data for variables: 'susceptible', 'infected', 'recovered'.
setup = calibration_setup(file = SIR$setup)
observed = calibration_data(setup=setup, path=SIR$path)
## Creating observed data list for calibration...
## 
## Loaded observed data for variables: 'susceptible', 'infected', 'recovered'.
run_model = calibrar:::.SIRModel

To simulate the model, we will create a function taking a vector or list of parameters par, that solves the EDO system and return a list with the simulated variables (\(S\), \(I\), \(R\)):

run_model = function(par, T) {
  if(!requireNamespace("deSolve", quietly = TRUE))
    stop("You need to install the 'deSolve' package.")
  # par is a list with 'alpha', 'beta' 'gamma', 'sd' and 'mu_ini'.
  SIR = function(t, y, parms, ...) {
    N = sum(unlist(parms$initial))
    beta = parms$beta
    gamma = parms$gamma
    S = y[1]
    I = y[2]
    dS = -beta*S*I/N
    dI = +beta*S*I/N -gamma*I
    dR = +gamma*I
    return(list(c(dS, dI, dR)))
  }
  times = seq(0, T)
  y0 = c(par$initial$S, par$initial$I, par$initial$R)
  sol = deSolve::ode(y=y0, times=times, func=SIR, parms=par, method="ode45")
  out = as.list(as.data.frame(sol[,-1]))
  names(out) = c("susceptible", "infected", "recovered")
  out$susceptible[is.na(out$susceptible)] = 0
  out$infected[is.na(out$infected)] = 0
  out$recovered[is.na(out$recovered)] = 0
  return(out)
}

As in the previous example, the objective function will be created based on the run_model() function and the information in the setup table created with the demo:

variable type calibrate weight use_data file varid col_skip nrows
susceptible lnorm2 TRUE 1 TRUE data/susceptible.csv susceptible 2 NA
infected lnorm2 TRUE 1 TRUE data/infected.csv infected 2 NA
recovered lnorm2 TRUE 1 TRUE data/recovered.csv recovered 2 NA
obj = calibration_objFn(model=run_model, setup=setup, observed=observed, T=SIR$T, aggregate=TRUE)

Now, we can try several optimization algorithms to estimate the parameters of the model:

lbfgsb3 = calibrate(par=SIR$guess, fn=obj, method='LBFGSB3', lower=SIR$lower, upper=SIR$upper, phases=SIR$phase)
## Using optimization method 'LBFGSB3'.
## Elapsed time: 7.68s
## Function value: 2.38128
## Parameter values: 0.408 0.215
## 
## Status: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
lbfgsb2 = calibrate(par=SIR$guess, fn=obj, method='Rvmmin', lower=SIR$lower, upper=SIR$upper, phases=SIR$phase)
## Using optimization method 'Rvmmin'.
## Elapsed time: 0.30s
## Function value: 7.56969e-13
## Parameter values: 0.4 0.2
## 
## Status: Rvmminb appears to have converged
ahr = calibrate(par=SIR$guess, fn=obj, method='AHR-ES', lower=SIR$lower, upper=SIR$upper, phases=SIR$phase)
## Using optimization method 'AHR-ES'.
## Elapsed time: 26.50s
## Function value: 9.62938e-22
## Parameter values: 0.4 0.2
## 
## Status: Stopping criteria reached in 1031 generations.
cg = calibrate(par=SIR$guess, fn=obj, method='Rcgmin', phases=SIR$phase)
## Using optimization method 'Rcgmin'.
## Elapsed time: 2m 15.6s
## Function value: 1.42228e-14
## Parameter values: 0.4 0.2
## 
## Status: Too many function evaluations (> 866)
nm = calibrate(par=SIR$guess, fn=obj, method='Nelder-Mead', phases=SIR$phase)
## Using optimization method 'Nelder-Mead'.
## Elapsed time: 0.29s
## Function value: 2.5122e-05
## Parameter values: 0.4 0.2
## 
## Status: -

and compare them:

summary(SIR, lbfgsb2, lbfgsb3, ahr, cg, nm, show_par = 1:2)
##              method elapsed    value   fn  gr  beta gamma
## SIR            data      NA 4.45e-28   NA  NA 0.400 0.200
## lbfgsb2      Rvmmin   0.297 7.57e-13   24  16 0.400 0.200
## lbfgsb3     LBFGSB3   7.677 2.38e+00   86  86 0.408 0.215
## ahr          AHR-ES  26.499 9.63e-22 6186   0 0.400 0.200
## cg           Rcgmin 135.606 1.42e-14  868 114 0.400 0.200
## nm      Nelder-Mead   0.288 2.51e-05   73  NA 0.400 0.200

In this example, the algorithms ‘Rvmmin’, ‘AHR-ES’, ‘Rcgmin’ and ‘Nelder-Mead’ are able to estimate the original parameters, but the ‘LBFGSB3’ fails.

Please, refer to vignette(package="calibrar") for additional vignettes or to the calibrar website for more details.