---
title: "Simulation"
output:
rmarkdown::html_vignette:
toc: true
number_sections: true
vignette: >
%\VignetteIndexEntry{Simulation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
There are multiple ways to simulate MMRM datasets using `brms` and `brms.mmrm`.
# Simple
`brm_simulate_simple()` simulates a dataset from the prior predictive distribution of a simple special case of an MMRM.^[The function help file explains the details about the model parameterization.]
``` r
library(brms.mmrm)
set.seed(0)
sim <- brm_simulate_simple(
n_group = 3,
n_patient = 100,
n_time = 4
)
```
The `data` element has a classed `tibble` you can directly supply to `brm_formula()` and `brm_model()`.
``` r
sim$data
#> # A tibble: 1,200 × 4
#> patient time response group
#>
#> 1 patient_001 time_1 1.11 group_1
#> 2 patient_001 time_2 2.15 group_1
#> 3 patient_001 time_3 2.54 group_1
#> 4 patient_001 time_4 -1.73 group_1
#> 5 patient_002 time_1 1.11 group_1
#> 6 patient_002 time_2 2.64 group_1
#> 7 patient_002 time_3 1.69 group_1
#> 8 patient_002 time_4 0.783 group_1
#> 9 patient_003 time_1 0.118 group_1
#> 10 patient_003 time_2 2.48 group_1
#> # ℹ 1,190 more rows
```
The `parameters` element has the corresponding parameter values simulated from the joint prior. Arguments to `brm_simulate_simple()` control hyperparameters.
``` r
str(sim$parameters)
#> List of 5
#> $ beta : num [1:6] 1.263 -0.326 1.33 1.272 0.415 ...
#> $ tau : num [1:4] -0.092857 -0.029472 -0.000577 0.240465
#> $ sigma : num [1:4] 0.911 0.971 0.999 1.272
#> $ lambda : num [1:4, 1:4] 1 0.415 -0.818 -0.282 0.415 ...
#> $ covariance: num [1:4, 1:4] 0.831 0.368 -0.745 -0.326 0.368 ...
```
And the `model_matrix` element has the regression model matrix of fixed effect parameters.
``` r
head(sim$model_matrix)
#> groupgroup_1 groupgroup_2 groupgroup_3 timetime_2 timetime_3 timetime_4
#> 1 1 0 0 0 0 0
#> 2 1 0 0 1 0 0
#> 3 1 0 0 0 1 0
#> 4 1 0 0 0 0 1
#> 5 1 0 0 0 0 0
#> 6 1 0 0 1 0 0
```
# Change from baseline
`brm_data_change()` can convert the outcome variable from raw response to change from baseline. This applies to real datasets passed through [brm_data()] as well as simulated ones from e.g. [brm_simulate_simple()]. The dataset above uses raw response with a baseline time point of `"time_1"`
``` r
sim$data
#> # A tibble: 1,200 × 4
#> patient time response group
#>
#> 1 patient_001 time_1 1.11 group_1
#> 2 patient_001 time_2 2.15 group_1
#> 3 patient_001 time_3 2.54 group_1
#> 4 patient_001 time_4 -1.73 group_1
#> 5 patient_002 time_1 1.11 group_1
#> 6 patient_002 time_2 2.64 group_1
#> 7 patient_002 time_3 1.69 group_1
#> 8 patient_002 time_4 0.783 group_1
#> 9 patient_003 time_1 0.118 group_1
#> 10 patient_003 time_2 2.48 group_1
#> # ℹ 1,190 more rows
```
`brm_data_change()` subtracts baseline, replaces the raw response column with a new change from baseline column, adds a new column for the original baseline raw response, and adjusts the internal attributes of the classed object accordingly.
``` r
brm_data_change(
data = sim$data,
name_change = "new_change",
name_baseline = "new_baseline"
)
#> # A tibble: 900 × 5
#> patient time group new_change new_baseline
#>
#> 1 patient_001 time_2 group_1 1.04 1.11
#> 2 patient_001 time_3 group_1 1.43 1.11
#> 3 patient_001 time_4 group_1 -2.84 1.11
#> 4 patient_002 time_2 group_1 1.53 1.11
#> 5 patient_002 time_3 group_1 0.576 1.11
#> 6 patient_002 time_4 group_1 -0.328 1.11
#> 7 patient_003 time_2 group_1 2.37 0.118
#> 8 patient_003 time_3 group_1 3.07 0.118
#> 9 patient_003 time_4 group_1 -1.14 0.118
#> 10 patient_004 time_2 group_1 1.57 1.29
#> # ℹ 890 more rows
```
# Advanced
For a more nuanced simulation, build up the dataset layer by layer. Begin with `brm_simulate_outline()` to create an initial structure and a random missingness pattern. In `brm_simulate_outline()`, missing responses can come from either transitory intercurrent events or from dropouts. The `missing` column indicates which outcome values will be missing (`NA_real_`) in a later step. The `response` column is entirely missing for now and will be simulated later.
``` r
data <- brm_simulate_outline(
n_group = 2,
n_patient = 100,
n_time = 4,
rate_dropout = 0.3
)
data
#> # A tibble: 800 × 5
#> patient time group missing response
#>
#> 1 patient_001 time_1 group_1 FALSE NA
#> 2 patient_001 time_2 group_1 TRUE NA
#> 3 patient_001 time_3 group_1 TRUE NA
#> 4 patient_001 time_4 group_1 TRUE NA
#> 5 patient_002 time_1 group_1 FALSE NA
#> 6 patient_002 time_2 group_1 FALSE NA
#> 7 patient_002 time_3 group_1 TRUE NA
#> 8 patient_002 time_4 group_1 TRUE NA
#> 9 patient_003 time_1 group_1 FALSE NA
#> 10 patient_003 time_2 group_1 FALSE NA
#> # ℹ 790 more rows
```
Optionally add random continuous covariates `brm_simulate_continuous()` and random categorical covariates using `brm_simulate_categorical()`. In each case, the covariates are non-time-varying, which means each patient gets only one unique value.
``` r
data <- data |>
brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
brm_simulate_categorical(
names = c("status1", "status2"),
levels = c("present", "absent")
)
data
#> # A tibble: 800 × 9
#> patient time group missing response biomarker1 biomarker2 status1 status2
#>
#> 1 patient_0… time… grou… FALSE NA 0.328 -0.655 present absent
#> 2 patient_0… time… grou… TRUE NA 0.328 -0.655 present absent
#> 3 patient_0… time… grou… TRUE NA 0.328 -0.655 present absent
#> 4 patient_0… time… grou… TRUE NA 0.328 -0.655 present absent
#> 5 patient_0… time… grou… FALSE NA 1.04 -0.779 absent absent
#> 6 patient_0… time… grou… FALSE NA 1.04 -0.779 absent absent
#> 7 patient_0… time… grou… TRUE NA 1.04 -0.779 absent absent
#> 8 patient_0… time… grou… TRUE NA 1.04 -0.779 absent absent
#> 9 patient_0… time… grou… FALSE NA 0.717 -0.954 present absent
#> 10 patient_0… time… grou… FALSE NA 0.717 -0.954 present absent
#> # ℹ 790 more rows
```
As described in the next section, `brms.mmrm` has a convenient function `brm_simulate_prior()` to simulate the outcome variable `response` using the data skeleton above and the prior predictive distribution. However, if you prefer a full custom approach, you may need granular details about the parameterization, which requires the model matrix. Fortunately, `brms` supports a `make_standata()` function to provide this, given a dataset and a formula. You may need to temporarily set the response variable to something non-missing, and you may wish to specify a custom prior.
``` r
library(brms)
formula <- brm_formula(data = mutate(data, response = 0))
formula
#> response ~ group + group:time + time + biomarker1 + biomarker2 + status1 + status2 + unstr(time = time, gr = patient)
#> sigma ~ 0 + time
stan_data <- make_standata(
formula = formula,
data = mutate(data, response = 0)
)
model_matrix <- stan_data$X
head(model_matrix)
#> Intercept groupgroup_2 timetime_2 timetime_3 timetime_4 biomarker1 biomarker2
#> 1 1 0 0 0 0 0.3283275 -0.6547971
#> 2 1 0 1 0 0 0.3283275 -0.6547971
#> 3 1 0 0 1 0 0.3283275 -0.6547971
#> 4 1 0 0 0 1 0.3283275 -0.6547971
#> 5 1 0 0 0 0 1.0385746 -0.7793828
#> 6 1 0 1 0 0 1.0385746 -0.7793828
#> status1present status2present groupgroup_2:timetime_2 groupgroup_2:timetime_3
#> 1 1 0 0 0
#> 2 1 0 0 0
#> 3 1 0 0 0
#> 4 1 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> groupgroup_2:timetime_4
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
```
# Prior
Function `brm_simulate_prior()` simulates from the prior predictive distribution. It requires a dataset and a formula, and it accepts a custom prior constructed with `brms::set_prior()`.
``` r
formula <- brm_formula(data = data)
library(brms)
prior <- set_prior("student_t(3, 0, 1.3)", class = "Intercept") +
set_prior("student_t(3, 0, 1.2)", class = "b") +
set_prior("student_t(3, 0, 1.1)", class = "b", dpar = "sigma") +
set_prior("lkj(1)", class = "cortime")
prior
#> prior class coef group resp dpar nlpar lb ub source
#> student_t(3, 0, 1.3) Intercept user
#> student_t(3, 0, 1.2) b user
#> student_t(3, 0, 1.1) b sigma user
#> lkj(1) cortime user
sim <- brm_simulate_prior(
data = data,
formula = formula,
prior = prior,
refresh = 0
)
```
The output object `sim` has multiple draws from the prior predictive distribution. `sim$outcome` has outcome draws, and `sim$parameters` has parameter draws. `sim$model_matrix` has the model matrix, and `sim$model` has the full `brms` model fit object. You can pass `sim$model` to functions from `brms` and `bayesplot` such as `pp_check()`.
``` r
names(sim)
#> [1] "data" "model" "model_matrix" "outcome" "parameters"
```
In addition, `sim$data` has a copy of the original dataset, but with the outcome variable taken from the final draw from the prior predictive distribution. In addition, the missingness pattern is automatically applied so that `sim$data$response` is `NA_real_` whenever `sim$data$missing` equals `TRUE`.
``` r
sim$data
#> # A tibble: 800 × 9
#> patient time group missing response biomarker1 biomarker2 status1 status2
#>
#> 1 patient_0… time… grou… FALSE 3.90 0.328 -0.655 present absent
#> 2 patient_0… time… grou… TRUE NA 0.328 -0.655 present absent
#> 3 patient_0… time… grou… TRUE NA 0.328 -0.655 present absent
#> 4 patient_0… time… grou… TRUE NA 0.328 -0.655 present absent
#> 5 patient_0… time… grou… FALSE 3.46 1.04 -0.779 absent absent
#> 6 patient_0… time… grou… FALSE 2.66 1.04 -0.779 absent absent
#> 7 patient_0… time… grou… TRUE NA 1.04 -0.779 absent absent
#> 8 patient_0… time… grou… TRUE NA 1.04 -0.779 absent absent
#> 9 patient_0… time… grou… FALSE 5.28 0.717 -0.954 present absent
#> 10 patient_0… time… grou… FALSE 0.120 0.717 -0.954 present absent
#> # ℹ 790 more rows
```
# Posterior
`brms` supports posterior predictive simulations and checks through functions `posteiror_predict()`, `posterior_epred()`, and `pp_check()`. These can be used with a `brms` model fit object either from `brm_model()` or from `brm_simulate_prior()`.
``` r
data <- sim$data
formula <- brm_formula(data = data)
model <- brm_model(data = data, formula = formula, refresh = 0)
outcome_draws <- posterior_predict(object = model)
```
The returned `outcome_draws` object is a numeric array of posterior predictive draws, with one row per draw and one column per non-missing observation (row) in the original data.
``` r
str(outcome_draws)
#> num [1:4000, 1:659] 4.57 3.84 3.88 3.06 3.48 ...
dim(data)
#> [1] 800 9
sum(!is.na(data$response))
#> [1] 659
```