Help for package simulist

Title:

Simulate Disease Outbreak Line List and Contacts Data

Version:

0.6.0

Description:

Tools to simulate realistic raw case data for an epidemic in the form of line lists and contacts using a branching process. Simulated outbreaks are parameterised with epidemiological parameters and can have age-structured populations, age-stratified hospitalisation and death risk and time-varying case fatality risk.

License:

MIT + file LICENSE

URL:

https://github.com/epiverse-trace/simulist, https://epiverse-trace.github.io/simulist/

BugReports:

https://github.com/epiverse-trace/simulist/issues

Depends:

R (≥ 4.2.0)

Imports:

checkmate, english, epiparameter (≥ 0.4.0), grates, randomNames, rlang, stats

Suggests:

dplyr, epicontacts (≥ 1.1.3), ggplot2, incidence2 (≥ 2.6.2), knitr, rmarkdown, spelling, testthat (≥ 3.0.0), tidyr

VignetteBuilder:

knitr

Config/Needs/website:

epiverse-trace/epiversetheme

Config/testthat/edition:

Encoding:

UTF-8

Language:

en-GB

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-08-28 10:32:22 UTC; lshjl15

Author:

Joshua W. Lambert

[aut, cre, cph], Carmen Tamayo Cuartero

[aut], Hugo Gruson

[ctb, rev], Pratik R. Gupte

[ctb, rev], Adam Kucharski

[rev], Chris Hartgerink

[rev], Sebastian Funk

[ctb], London School of Hygiene and Tropical Medicine, LSHTM

[cph]

Maintainer:

Joshua W. Lambert <joshua.lambert@lshtm.ac.uk>

Repository:

CRAN

Date/Publication:

2025-08-28 11:10:02 UTC

simulist: Simulate Disease Outbreak Line List and Contacts Data

Description

Author(s)

Maintainer: Joshua W. Lambert joshua.lambert@lshtm.ac.uk (ORCID) [copyright holder]

Authors:

Carmen Tamayo Cuartero carmen.tamayo-cuartero@lshtm.ac.uk (ORCID)

Other contributors:

Hugo Gruson hugo@data.org (ORCID) [contributor, reviewer]
Pratik R. Gupte pratik.gupte@lshtm.ac.uk (ORCID) [contributor, reviewer]
Adam Kucharski adam.kucharski@lshtm.ac.uk (ORCID) [reviewer]
Chris Hartgerink chris@data.org (ORCID) [reviewer]
Sebastian Funk sebastian.funk@lshtm.ac.uk (ORCID) [contributor]
London School of Hygiene and Tropical Medicine, LSHTM (00a0jsq62) [copyright holder]

Add line list event dates and case information as columns to infectious history `⁠<data.frame>⁠`

Description

These ⁠.add_*()⁠ functions add columns to the ⁠<data.frame>⁠ output by .sim_network_bp(). The ⁠<data.frame>⁠ supplied to .data will have a different number of columns depending on which function is being called (i.e. the ⁠<data.frame>⁠ supplied to .add_hospitalisation() will have more columns than the ⁠<data.frame>⁠ supplied to .add_date_contact() as former function is called later in the simulation).

The event date could be first contact, last contact or other.

Usage

.add_date_contact(
  .data,
  first_contact_distribution,
  last_contact_distribution,
  outbreak_start_date
)

.add_hospitalisation(.data, onset_to_hosp, hosp_risk)

.add_outcome(
  .data,
  onset_to_death,
  onset_to_recovery,
  hosp_death_risk,
  non_hosp_death_risk,
  config
)

.add_names(.data, anonymise = FALSE)

.add_ct(.data, distribution)

.add_reporting_delay(.data, reporting_delay)

Arguments

.data

A ⁠<data.frame>⁠ containing the infectious history from a branching process simulation (.sim_network_bp()).

first_contact_distribution, last_contact_distribution

A function to generate the time for the first or last contact between the infector and infectee (exposure window). See create_config().

outbreak_start_date

A date for the start of the outbreak.

onset_to_hosp

A function or an ⁠<epiparameter>⁠ object for the onset-to-hospitalisation delay distribution. onset_to_hosp can also be set to NULL to not simulate hospitalisation (admission) dates.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-hospitalisation delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 1.5 and sdlog = 0.5.

If onset_to_hosp is set to NULL then hosp_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

hosp_risk

Either a single numeric for the hospitalisation risk of everyone in the population, or a ⁠<data.frame>⁠ with age specific hospitalisation risks. Default is 20% hospitalisation (0.2) for the entire population. If the onset_to_hosp argument is set to NULL this argument will automatically be set to NULL if not specified or can be manually set to NULL. See details and examples for more information.

onset_to_death

A function or an ⁠<epiparameter>⁠ object for the onset-to-death delay distribution. onset_to_death can also be set to NULL to not simulate dates for individuals that died.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-death delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2.5 and sdlog = 0.5.

If onset_to_death is set to NULL then non_hosp_death_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

For hospitalised cases, the function ensures the onset-to-death time is greater than the onset-to-hospitalisation time. After many (1000) attempts, if an onset-to-death time (from onset_to_death) cannot be sampled that is greater than a onset-to-hospitalisation time (from onset_to_hosp) then the function will error. Due to this conditional sampling, the onset-to-death times in the line list may not resemble the distributional form input into the function.

onset_to_recovery

A function or an ⁠<epiparameter>⁠ object for the onset-to-recovery delay distribution. onset_to_recovery can also be NULL to not simulate dates for individuals that recovered.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-recovery delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is NULL so by default cases that recover get an NA in the ⁠$date_outcome⁠ line list column.

For hospitalised cases, the function ensures the onset-to-recovery time is greater than the onset-to-hospitalisation time. After many (1000) attempts, if an onset-to-recovery time (from onset_to_recovery) cannot be sampled that is greater than a onset-to-hospitalisation time (from onset_to_hosp) then the function will error. Due to this conditional sampling, the onset-to-recovery times in the line list may not resemble the distributional form input into the function.

hosp_death_risk

Either a single numeric for the death risk for hospitalised individuals across the population, or a ⁠<data.frame>⁠ with age specific hospitalised death risks Default is 50% death risk in hospitals (0.5) for the entire population. If the onset_to_death argument is set to NULL this argument will automatically be set to NULL if not specified or can be manually set to NULL. See details and examples for more information. The hosp_death_risk can vary through time if specified in the time_varying_death_risk element of config, see vignette("time-varying-cfr", package = "simulist") for more information.

non_hosp_death_risk

Either a single numeric for the death risk for outside of hospitals across the population, or a ⁠<data.frame>⁠ with age specific death risks outside of hospitals. Default is 5% death risk outside of hospitals (0.05) for the entire population. If the onset_to_death argument is set to NULL this argument will automatically be set to NULL if not specified or can be manually set to NULL. See details and examples for more information. The non_hosp_death_risk can vary through time if specified in the time_varying_death_risk element of config, see vignette("time-varying-cfr", package = "simulist") for more information.

config

A list of settings to adjust the randomly sampled delays and Ct values. See create_config() for more information.

anonymise

A logical boolean for whether case names should be anonymised. Default is FALSE.

reporting_delay

A function for the reporting delay distribution or NULL. The (random) number generating function creates delays between the time of symptom onset (⁠$date_onset⁠) and the case being reported (⁠$date_reporting⁠).

The function can be defined or anonymous. The function must return a vector of numerics for the length of the reporting delay. The function must have a single argument.

The default is NULL so by default there is no reporting delay, and the ⁠$date_reporting⁠ line list column is identical to the ⁠$date_onset⁠ column.

Value

A ⁠<data.frame>⁠ with one more column than input into .data. Unless the column heading is already present in which the data is overwritten.

Introduce user-specified proportion of custom missing values into a `⁠<data.frame>⁠`

Description

Introduce user-specified proportion of custom missing values into a ⁠<data.frame>⁠

Usage

.add_missing(linelist, .args)

Arguments

linelist

Line list ⁠<data.frame>⁠ output from sim_linelist().

.args

A list of setting from messy_linelist().

Value

A line list ⁠<data.frame>⁠

Anonymise names

Description

A simple algorithm to replace names with an alphanumeric string with an fixed number of characters (i.e. nchar()) specified by string_len.

Usage

.anonymise(x, string_len = 10)

Arguments

x

A vector of character strings.

string_len

A single numeric specifying the number of alphanumeric characters to use for each anonymising character string. Default is 10.

Value

A vector of character strings of equal length to the input.

Check if `⁠<data.frame>⁠` defining either age-stratified hospitalisation or death risk, or defining age structure of population is correct

Description

Check if ⁠<data.frame>⁠ defining either age-stratified hospitalisation or death risk, or defining age structure of population is correct

Usage

.check_df(x, df_type = c("risk", "age"), age_range = NULL)

Arguments

x

A ⁠<data.frame>⁠.

df_type

A character string, either "risk" or "age" to specify which input ⁠<data.frame>⁠ is being checked.

age_range

A numeric vector of length 2. Only required when df_type = risk, NULL by default.

Value

A ⁠<data.frame>⁠, also called for error side-effects when input is invalid.

Check if R object is line list from `sim_linelist()`

Description

Check if R object is line list from sim_linelist()

Usage

.check_linelist(linelist)

Arguments

linelist

Line list ⁠<data.frame>⁠ output from sim_linelist().

Details

This is a check that the object supplied to linelist is from the sim_linelist() or sim_outbreak() functions, it is not related to the class of the object, in other words, it does not check the object is of class ⁠<linelist>⁠.

Value

Invisibly return the linelist ⁠<data.frame>⁠. The function is called for its side-effects, which will error if the input is invalid.

Check if arguments input to simulation function are valid

Description

Check if arguments input to simulation function are valid

Usage

.check_sim_input(
  sim_type = c("linelist", "contacts", "outbreak"),
  contact_distribution,
  infectious_period,
  prob_infection,
  outbreak_start_date,
  outbreak_size,
  onset_to_hosp = NULL,
  onset_to_death = NULL,
  onset_to_recovery = NULL,
  anonymise = NULL,
  case_type_probs = NULL,
  contact_tracing_status_probs = NULL,
  hosp_risk = NULL,
  hosp_death_risk = NULL,
  non_hosp_death_risk = NULL,
  population_age = NULL
)

Arguments

sim_type

A character string specifying which simulation function this function is being called within.

contact_distribution

A function or an ⁠<epiparameter>⁠ object to generate the number of contacts per infection.

The function can be defined or anonymous. The function must have a single argument in the form of an integer vector with elements representing the number of contacts, and return a numeric vector where each element corresponds to the probability of observing the number of contacts in the vector passed to the function. The index of the numeric vector returned is offset by one to the corresponding probability of observing the number of contacts, i.e. the first element of the output vector is the probability of observing zero contacts, the second element is the probability of observing one contact, etc.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a probability mass function internally.

The default is an anonymous function with a Poisson probability mass function (dpois()) with a mean (\lambda) of 2 contacts per infection.

infectious_period

A function or an ⁠<epiparameter>⁠ object for the infectious period. This defines the duration from becoming infectious to no longer infectious. In the simulation, individuals are assumed to become infectious immediately after being infected (the latency period is assumed to be zero). The time intervals between an infected individual and their contacts are assumed to be uniformly distributed within the infectious period. Infectious periods must be strictly positive.

The function can be defined or anonymous. The function must return a vector of randomly generated real numbers representing sampled infectious periods. The function must have a single argument, the number of random infectious periods to generate.

An ⁠<epiparameter>⁠ can be provided. This will be converted into random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2 and sdlog = 0.5.

prob_infection

A single numeric for the probability of a secondary contact being infected by an infected primary contact.

outbreak_start_date

A date for the start of the outbreak.

outbreak_size

A numeric vector of length 2 defining the minimum and the maximum number of infected individuals for the simulated outbreak. Default is c(10, 1e4), so the minimum outbreak size is 10 infected individuals, and the maximum outbreak size is 10,000 infected individuals. Either number can be changed to increase or decrease the maximum or minimum outbreak size to allow simulating larger or smaller outbreaks. If the minimum outbreak size cannot be reached after running the simulation for many iterations (internally) then the function errors, whereas if the maximum outbreak size is exceeded the function returns the data early and a warning stating how many cases and contacts are returned.

onset_to_hosp

A function or an ⁠<epiparameter>⁠ object for the onset-to-hospitalisation delay distribution. onset_to_hosp can also be set to NULL to not simulate hospitalisation (admission) dates.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-hospitalisation delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 1.5 and sdlog = 0.5.

If onset_to_hosp is set to NULL then hosp_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_death

A function or an ⁠<epiparameter>⁠ object for the onset-to-death delay distribution. onset_to_death can also be set to NULL to not simulate dates for individuals that died.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-death delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2.5 and sdlog = 0.5.

If onset_to_death is set to NULL then non_hosp_death_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_recovery

A function or an ⁠<epiparameter>⁠ object for the onset-to-recovery delay distribution. onset_to_recovery can also be NULL to not simulate dates for individuals that recovered.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-recovery delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is NULL so by default cases that recover get an NA in the ⁠$date_outcome⁠ line list column.

anonymise

A logical boolean for whether case names should be anonymised. Default is FALSE.

case_type_probs

A named numeric vector with the probability of each case type. The names of the vector must be "suspected", "probable", "confirmed". Values of each case type must sum to one.

contact_tracing_status_probs

A named numeric vector with the probability of each contact tracing status. The names of the vector must be "under_followup", "lost_to_followup", "unknown". Values of each contact tracing status must sum to one.

hosp_risk

hosp_death_risk

non_hosp_death_risk

population_age

Either a numeric vector with two elements or a ⁠<data.frame>⁠ with age structure in the population. Use a numeric vector to specific the age range of the population, the first element is the lower bound for the age range, and and the second is the upper bound for the age range (both inclusive, i.e. [lower, upper]). The ⁠<data.frame>⁠ with age groups and the proportion of the population in that group. See details and examples for more information.

Details

Arguments that are used by all simulation functions are required and not given a default value, for other arguments that are not inputs in all simulation functions a default of NULL is used.

Defaults mentioned in argument documentation is the default for the exported simulation function and not the default in this checking function. In this function there is either no default or NULL.

Value

Invisibly return the sim_type character string. The function is called for its side-effects, which will error if the input is invalid.

Cross check the onset-to-hospitalisation or -death arguments are compatible with hospitalisation and death risks

Description

There are two types of cross-checking:

If the onset-to-event distribution is specified but the corresponding risk is not specified (i.e. NULL) the function will error (stop()).
If the onset-to-event distribution is not specified (i.e. NULL) but the corresponding risk is specified the function will throw a warning (warning()).

The difference in condition handling is because in the case that the onset-to-event is NULL the simulation can safely ignore the corresponding risk, while throwing a warning, as there are no events. In other words, if the onset-to-hospitalisation is not specified, no infected individuals will go to hospital and the date_admission column in the line list will all be NAs. However, if the onset-to-event is specified and the corresponding risk is NULL then the proportion of individuals infected that are hospitalised or die cannot be calculated and therefore the simulation cannot run. It is in this latter case that the cross-checking throws an error.

Usage

.cross_check_sim_input(
  onset_to_hosp,
  onset_to_death,
  hosp_risk,
  hosp_death_risk,
  non_hosp_death_risk
)

Arguments

onset_to_hosp

A function or an ⁠<epiparameter>⁠ object for the onset-to-hospitalisation delay distribution. onset_to_hosp can also be set to NULL to not simulate hospitalisation (admission) dates.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-hospitalisation delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 1.5 and sdlog = 0.5.

If onset_to_hosp is set to NULL then hosp_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_death

A function or an ⁠<epiparameter>⁠ object for the onset-to-death delay distribution. onset_to_death can also be set to NULL to not simulate dates for individuals that died.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-death delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2.5 and sdlog = 0.5.

If onset_to_death is set to NULL then non_hosp_death_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

hosp_risk

hosp_death_risk

non_hosp_death_risk

Value

Invisibly return the onset_to_hosp argument. The function is called for its side-effects, which will error or warn if the input is invalid.

Sample names using `randomNames::randomNames()`

Description

Sample names for specified sexes by sampling with replacement to avoid exhausting number of name when sample.with.replacement = FALSE. The duplicated names during sampling need to be removed to ensure each individual has a unique name. In order to have enough unique names, more names than required are sampled from randomNames::randomNames(), and the level of oversampling is determined by the buffer_factor argument. A buffer_factor too high and the more names are sampled which takes longer, a buffer_factor too low and not enough unique names are sampled and the .sample_names() function will need to loop until it has enough unique names.

Usage

.sample_names(.data, buffer_factor = 1.5)

Arguments

.data

A ⁠<data.frame>⁠ containing the infectious history from a branching process simulation (.sim_network_bp()).

buffer_factor

A single numeric determining the level of oversampling (or buffer) when creating a vector of unique names from randomNames::randomNames().

Value

A character vector.

Sample the onset-to-outcome time conditional that the outcome is after a hospitalisation event

Description

The outcome of a case, either died or recovered, can have a time of event, but this must be after the hospitalisation time, if a case has been admitted to hospital. This function samples either the onset-to-death or onset-to-recovery time conditional on it being greater than a onset-to-hospitalisation time for a given case, if the case was admitted to hospital. It does this by resampling onset-to-outcome (death or recovery) times if they are less than the onset-to-hospitalisation time (from .add_hospitalisation()).

Usage

.sample_outcome_time(.data, onset_to_outcome, idx)

Arguments

.data

A ⁠<data.frame>⁠ containing the infectious history from a branching process simulation (.sim_network_bp()).

onset_to_outcome

A function for either the onset-to-death or onset-to-recovery delay distribution. onset_to_outcome can also be set to NULL to not simulate dates for individuals that died or recovered. See sim_linelist() documentation for more information.

idx

Either the infected_lgl_idx or died_lgl_idx from .add_outcomes() to define who to sample recovery or death times for, respectively.

Value

A ⁠<data.frame>⁠ with a potentially modified ⁠$outcome_time⁠ column.

Internal simulation function called by the exported simulation functions within simulist

Description

This internal function simulates a line list, and when sim_type is "contacts" or "outbreak" a contacts table as well.

Usage

.sim_internal(
  sim_type = c("linelist", "contacts", "outbreak"),
  contact_distribution,
  infectious_period,
  prob_infection,
  onset_to_hosp = NULL,
  onset_to_death = NULL,
  onset_to_recovery = NULL,
  reporting_delay = NULL,
  hosp_risk = NULL,
  hosp_death_risk = NULL,
  non_hosp_death_risk = NULL,
  outbreak_start_date,
  anonymise = NULL,
  outbreak_size,
  population_age,
  case_type_probs = NULL,
  contact_tracing_status_probs = NULL,
  config
)

Arguments

sim_type

A character string specifying which simulation function this function is being called within.

contact_distribution

A function or an ⁠<epiparameter>⁠ object to generate the number of contacts per infection.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a probability mass function internally.

The default is an anonymous function with a Poisson probability mass function (dpois()) with a mean (\lambda) of 2 contacts per infection.

infectious_period

An ⁠<epiparameter>⁠ can be provided. This will be converted into random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2 and sdlog = 0.5.

prob_infection

A single numeric for the probability of a secondary contact being infected by an infected primary contact.

onset_to_hosp

A function or an ⁠<epiparameter>⁠ object for the onset-to-hospitalisation delay distribution. onset_to_hosp can also be set to NULL to not simulate hospitalisation (admission) dates.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-hospitalisation delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 1.5 and sdlog = 0.5.

If onset_to_hosp is set to NULL then hosp_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_death

A function or an ⁠<epiparameter>⁠ object for the onset-to-death delay distribution. onset_to_death can also be set to NULL to not simulate dates for individuals that died.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-death delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2.5 and sdlog = 0.5.

If onset_to_death is set to NULL then non_hosp_death_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_recovery

A function or an ⁠<epiparameter>⁠ object for the onset-to-recovery delay distribution. onset_to_recovery can also be NULL to not simulate dates for individuals that recovered.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-recovery delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is NULL so by default cases that recover get an NA in the ⁠$date_outcome⁠ line list column.

reporting_delay

The function can be defined or anonymous. The function must return a vector of numerics for the length of the reporting delay. The function must have a single argument.

The default is NULL so by default there is no reporting delay, and the ⁠$date_reporting⁠ line list column is identical to the ⁠$date_onset⁠ column.

hosp_risk

hosp_death_risk

non_hosp_death_risk

outbreak_start_date

A date for the start of the outbreak.

anonymise

A logical boolean for whether case names should be anonymised. Default is FALSE.

outbreak_size

population_age

case_type_probs

A named numeric vector with the probability of each case type. The names of the vector must be "suspected", "probable", "confirmed". Values of each case type must sum to one.

contact_tracing_status_probs

config

A list of settings to adjust the randomly sampled delays and Ct values. See create_config() for more information.

Value

A ⁠<data.frame>⁠ if sim_type is "linelist" or "contacts", or a list of two ⁠<data.frame>⁠s if sim_type is "outbreak".

Simulate a random network branching process model with a probability of infection for each contact

Description

Simulate a branching process on a infinite network where the contact distribution provides a function to sample the number of contacts of each individual in the simulation. Each contact is then infected with the probability of infection. The time between each contact is assumed to be evenly distributed across the infectious period of the infected individual, and is independent of whether the contact becomes infected.

Usage

.sim_network_bp(
  contact_distribution,
  infectious_period,
  prob_infection,
  max_outbreak_size,
  config
)

Arguments

contact_distribution

A function or an ⁠<epiparameter>⁠ object to generate the number of contacts per infection.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a probability mass function internally.

The default is an anonymous function with a Poisson probability mass function (dpois()) with a mean (\lambda) of 2 contacts per infection.

infectious_period

An ⁠<epiparameter>⁠ can be provided. This will be converted into random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2 and sdlog = 0.5.

prob_infection

A single numeric for the probability of a secondary contact being infected by an infected primary contact.

config

A list of settings to adjust the randomly sampled delays and Ct values. See create_config() for more information.

Details

The contact distribution sampled takes the network effect q(n) \sim (n + 1)p(n + 1) where p(n) is the probability density function of a distribution, e.g., Poisson or Negative binomial. That is to say, the probability of having choosing a contact at random by following up a contact chooses individuals with a probability proportional to their number of contacts. The plus one is because one of the contacts was "used" to infect the person.

Value

A ⁠<data.frame>⁠ with the contact and transmission chain data.

Convert `⁠<epiparameter>⁠` or `NULL` to function

Description

An extension to as.function(), particularly the epiparameter as.function S3 method, with the added feature that NULLs are converted into functions that generate a vector of NAs to behave equivalently to the generator functions output from as.function(..., func_type = "generate").

If a function is already passed to as_function it will be returned unchanged.

There is also input checking to error if input is not an ⁠<epiparameter>⁠, function (closure), or for onset-to-event distributions NULL.

Usage

as_function(x)

Arguments

x

A named list containing either ⁠<epiparameter>⁠, function or NULL. Named list elements are: "contact_distribution", "infectious_period", "onset_to_hosp", "onset_to_death", ⁠"onset_to_recovery".⁠

Value

A list of functions.

Censor dates in line list

Description

Censor ⁠<Date>⁠ columns in line list output from sim_linelist() to a specified time interval.

This function is similar to incidence2::incidence() but does not aggregate events into an ⁠<incidence2>⁠ object, instead it returns the same line list ⁠<data.frame>⁠ as input but with modified event dates.

Usage

censor_linelist(
  linelist,
  interval,
  reporting_artefact = c("none", "weekend_effects"),
  offset = min(linelist$date_onset, na.rm = TRUE)
)

Arguments

linelist

Line list ⁠<data.frame>⁠ output from sim_linelist().

interval

An integer or character string for the size of the time interval for censoring. Valid character options are:

⁠"daily⁠
"weekly"
"epiweek"
"montly"
"yearly"

See details for information of the date/period objects that are returned for each interval type.

reporting_artefact

A character string, either "none" (default) or "weekend_effect". By default none of the dates are altered in other ways during censoring, however if reporting_artefact = "weekend_effect" then all the dates in the ⁠$date_reporting⁠ column that fall on a weekend are shifted to the following Monday. This artefact is commonly referred to as the "weekend effect" (see doi:10.1186/s13104-025-07145-y).

offset

An integer or ⁠<Date>⁠ for the value to start counting the period from (0 is the start of the Unix epoch). Only applicable if interval is specified as an integer.

Default date used to start counting from for the ⁠<grates_period>⁠ is the earliest symptom onset date (⁠$date_onset⁠). See grates::as_period() for more information.

If setting reporting_artefact = "weekend_effects" the period may start or end on a weekend.

Details

The line list columns that contain ⁠<Date>⁠ objects are stored at double point precision by default. In other words, they are not integer values, so can be part way through a day. The exact numeric value of the ⁠<Date>⁠ can be seen if you unclass() it.

Censoring line list dates reduces the time precision (window) of the event. Often dates of events, such as symptom onset or hospital admission are only known to the nearest day, not hour or minute. Other events may be more coarsely censored, for example to the nearest week or month. censor_linelist() converts the exact double point precision event ⁠<Date>⁠ to the time interval specified.

Depending on the interval specified, the date columns will be returned as different objects. Here is a list of the valid input interval and the resulting class of the date column.

integer -> ⁠<grates_period>⁠ (see grates::as_period())
"daily" -> ⁠<Date>⁠ (see Date)
"weekly" -> ⁠<grates_isoweek>⁠ (see grates::as_isoweek())
"epiweek" -> ⁠<grates_epiweek>⁠ (see grates::as_epiweek())
"monthly" -> ⁠<grates_yearmonth>⁠ (see grates::as_yearmonth())
"yearly" -> ⁠<grates_year>⁠ (see grates::as_year())

Value

A line list ⁠<data.frame>⁠.

Examples

set.seed(1)
linelist <- sim_linelist()
linelist_cens <- censor_linelist(linelist, interval = "daily")

# censor to a 3-day period
linelist_cens <- censor_linelist(linelist, interval = 3)

# no reporting of events on weekends
linelist_cens <- censor_linelist(
  linelist,
  interval = "daily",
  reporting_artefact = "weekend_effects"
)

Coerce and store `⁠<data.frame>⁠` subclass to `⁠<data.frame>⁠` and restore `⁠<data.frame>⁠` subclass to `⁠<data.frame>⁠` from attribute.

Description

Coerce and store ⁠<data.frame>⁠ subclass to ⁠<data.frame>⁠ and restore ⁠<data.frame>⁠ subclass to ⁠<data.frame>⁠ from attribute.

Usage

.as_df(x)

.restore_df_subclass(x)

Arguments

x

An R object.

Value

A ⁠<data.frame>⁠ or subclass of ⁠<data.frame>⁠.

Create a list of configuration settings for some details of `sim_linelist()`

Description

Create a list of configuration settings for some details of sim_linelist()

Usage

create_config(...)

Arguments

...

<dynamic-dots> Named elements to replace default settings. Only if names match exactly are elements replaced, otherwise the function errors.

Accepted arguments and their defaults are:

last_contact_distribution: A function to generate the time for last contact. Default parameterisation is a Poisson distribution with a \lambda of 3.
first_contact_distribution: A function to generate the time for the first contact. Default parameterisation is a Poisson distribution with a \lambda of 3.
ct_distribution: A function to generate Ct values for each confirmed case. Default parameterisation is a Normal distribution with a mean (\mu) of 25 and a standard deviation (\sigma) of 2.
network: A character string, either "adjusted" (default) or "unadjusted".
time_varying_death_risk: By default is NULL, but can also accept a function with two arguments, risk and time, to apply a time varying death risk of hospitalised and non-hospitalised cases in the outbreak simulation. See vignette("time-varying-cfr", package = "simulist").

Details

The config argument in sim_linelist() controls the small details around time windows around infections (time of first contact and last contact with infector), and the distribution of the Cycle threshold (Ct) value from a Real-time PCR or quantitative PCR (qPCR) for confirmed cases, the network effect in the simulation, and if there is a time-varying death risk.

These parameters do not warrant their own arguments in sim_linelist() as they rarely need to be changed from their default setting. Therefore it is not worth increasing the number of sim_linelist() arguments to accommodate these and the config argument keeps the function signature simpler and more readable.

The last_contact_distribution and first_contact_distribution can accept any function that generates positive integers (e.g. discrete probability distribution, rpois() or rgeom()). The ct_distribution can accept any function that generates real numbers (e.g. continuous or discrete probability distribution, rnorm(), rlnorm()).

The network option controls whether to sample contacts from a adjusted or unadjusted contact distribution. Adjusted (default) sampling uses q(n) \sim (n + 1)p(n + 1) where p(n) is the probability density function of a distribution, e.g., Poisson or Negative binomial. Unadjusted (network = "unadjusted") instead samples contacts directly from a probability distribution p(n).

Value

A list of settings for sim_linelist().

Examples

# example with default configuration
create_config()

# example with customised Ct distribution
create_config(
  ct_distribution = function(x) rlnorm(n = x, meanlog = 2, sdlog = 1)
)

Create messy line list data

Description

Take line list output from sim_linelist() and replace elements of the ⁠<data.frame>⁠ with missing values (e.g. NA), introduce spelling mistakes and inconsistencies, as well as coerce date types.

Usage

messy_linelist(linelist, ...)

Arguments

linelist

Line list ⁠<data.frame>⁠ output from sim_linelist().

...

<dynamic-dots> Named elements to replace default settings. Only if names match exactly are elements replaced, otherwise the function errors.

Accepted arguments and their defaults are:

prop_missing: A numeric between 0 and 1 for the proportion of missing values introduced. Default is 0.1 (10%).
missing_value: A vector with the missing value(s). If multiple values are supplied a missing value is randomly sampled for each cell in the line list. Default is NA.
prop_spelling_mistakes: A numeric between 0 and 1 used to specify the proportion of spelling mistakes in character columns. Default is 0.1 (10%).
inconsistent_sex: A logical boolean to specify whether the ⁠$sex⁠ column uses "m" and "f", or inconsistently uses "m", "f", "M", "F", "male", "female", "Male" or "Female". Default is TRUE so sexes are sampled from the options.
sex_as_numeric: A logical boolean used to specify whether the values in the ⁠$sex⁠ column should be encoded as numeric values (0 and 1). Default is FALSE. sex_as_numeric cannot be TRUE if inconsistent_sex = TRUE.
numeric_as_char: A logical boolean used to specify whether numeric columns should be coerced to character. Default is TRUE.
date_as_char: A logical boolean used to specify whether Date columns should be coerced to character. Default is TRUE.
inconsistent_dates: A logical boolean used to specify whether the values in Date columns are inconsistently formatted (e.g. "%Y-%m-%d", "%Y/%m/%d", "%d-%m-%Y", or "%d %B %Y"). Default is FALSE.
prop_int_as_word: A numeric between 0 and 1 for the proportion of elements in integer columns should that are coerced to words (see english::words()). Default is 0.5 (50%).
prop_duplicate_row: A numeric between 0 and 1 for the proportion of rows to duplicate. Default is 0.01 (1%). If prop_duplicate_row > 0 then it is guaranteed that at least one row will be duplicated.
inconsistent_id: A logical boolean used to specify whether the ⁠$id⁠ column has inconsistent formatting by appending random prefixes and suffixes to a random sample (~10%) of IDs. Default is FALSE, so IDs are numbers (numeric, characters or words depending on prop_int_as_word and numeric_as_char).

Details

By default messy_linelist():

Introduces 10% of values missing, i.e. converts to NA.
Introduces spelling mistakes in 10% of character columns.
Introduce inconsistency in the reporting of ⁠$sex⁠.
Converts numeric columns (double & integer) to character.
Converts Date columns to character.
Converts 50% of integers to (English) words.
Duplicates 1% of rows.

Setting missing_value to something other than NA will likely cause type coercion in the line list ⁠<data.frame>⁠ columns, most likely to character.

When setting sex_as_numeric to TRUE, male is set to 0 and female to 1. Only one of inconsistent_sex or sex_as_numeric can be TRUE, otherwise the function will error.

If numeric_as_char = TRUE and sex_as_numeric = TRUE then the sex encoded as 0 or 1 is converted to character. If prop_spelling_mistake > 0 and numeric_as_char = TRUE the columns that are converted from numeric to character do not have spelling mistakes introduced, because they are numeric characters stored as character strings. If prop_spelling_mistake > 0 and date_as_char = TRUE spelling mistakes are not introduced into dates.

The Date columns can be converted into an inconsistent format by setting inconsistent_dates = TRUE and it requires date_as_char = TRUE, if the latter is FALSE the function will error.

If numeric_as_char = FALSE and prop_int_as_word > 0 then the integer columns are converted to character string (either character numbers or words) but the other numeric columns are not coerced. Spelling mistakes are not introduced into integers converted to words when prop_spelling_mistakes > 0 and prop_int_as_word > 0.

Rows are duplicated after other messy modifications so the duplicated row contains identical messy elements.

Value

A messy line list ⁠<data.frame>⁠.

The output ⁠<data.frame>⁠ has the same structure as the input ⁠<data.frame>⁠ from sim_linelist(), with messy entries.

Examples

linelist <- sim_linelist()
messy_linelist <- messy_linelist(linelist)

# increasing proportion of missingness to 30% with a missing value of -99
messy_linelist <- messy_linelist(
  linelist,
  prop_missing = 0.3,
  missing_value = -99
)

# increasing proportion of spelling mistakes to 50%
messy_linelist <- messy_linelist(linelist, prop_spelling_mistakes = 0.5)

# encode `$sex` as `numeric`
messy_linelist <- messy_linelist(
  linelist,
  sex_as_numeric = TRUE,
  inconsistent_sex = FALSE
)

# inconsistently formatted dates
messy_linelist <- messy_linelist(linelist, inconsistent_dates = TRUE)

Simulate contacts for an infectious disease outbreak

Description

Simulate contacts for an infectious disease outbreak

Usage

sim_contacts(
  contact_distribution = function(x) stats::dpois(x = x, lambda = 2),
  infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5),
  prob_infection = 0.5,
  outbreak_start_date = as.Date("2023-01-01"),
  anonymise = FALSE,
  outbreak_size = c(10, 10000),
  population_age = c(1, 90),
  contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown
    = 0.1),
  config = create_config()
)

Arguments

contact_distribution

A function or an ⁠<epiparameter>⁠ object to generate the number of contacts per infection.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a probability mass function internally.

The default is an anonymous function with a Poisson probability mass function (dpois()) with a mean (\lambda) of 2 contacts per infection.

infectious_period

An ⁠<epiparameter>⁠ can be provided. This will be converted into random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2 and sdlog = 0.5.

prob_infection

A single numeric for the probability of a secondary contact being infected by an infected primary contact.

outbreak_start_date

A date for the start of the outbreak.

anonymise

A logical boolean for whether case names should be anonymised. Default is FALSE.

outbreak_size

population_age

contact_tracing_status_probs

config

A list of settings to adjust the randomly sampled delays and Ct values. See create_config() for more information.

Value

A contacts ⁠<data.frame>⁠.

The structure of the output is:

from: character column with name of case.
to: character column with name of contacts of case.
age: integer with age of infectee.
sex: character column with either "m" or "f" for the sex of the contact.
date_first_contact: ⁠<Date>⁠ column for the first contact between case and contacts.
date_last_contact: ⁠<Date>⁠ column for the last contact between case and contacts.
was_case: logical boolean column with either TRUE or FALSE for if the contact becomes a case.
status: character column with the status of each contact. By default it is either "case", "under_followup" "lost_to_followup", or "unknown".

Author(s)

Joshua W. Lambert, Carmen Tamayo

Examples

# quickly simulate contact tracing data using the function defaults
contacts <- sim_contacts()
head(contacts)

# to simulate more realistic contact tracing data load epiparameters from
# {epiparameter}
library(epiparameter)
contact_distribution <- epiparameter(
  disease = "COVID-19",
  epi_name = "contact distribution",
  prob_distribution = create_prob_distribution(
    prob_distribution = "pois",
    prob_distribution_params = c(mean = 2)
  )
)

infectious_period <- epiparameter(
  disease = "COVID-19",
  epi_name = "infectious period",
  prob_distribution = create_prob_distribution(
    prob_distribution = "gamma",
    prob_distribution_params = c(shape = 1, scale = 1)
  )
)

contacts <- sim_contacts(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5
)

Simulate a line list

Description

The line list is simulated using a branching process and parameterised with epidemiological parameters.

Usage

sim_linelist(
  contact_distribution = function(x) stats::dpois(x = x, lambda = 2),
  infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5),
  prob_infection = 0.5,
  onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5),
  onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5),
  onset_to_recovery = NULL,
  reporting_delay = NULL,
  hosp_risk = 0.2,
  hosp_death_risk = 0.5,
  non_hosp_death_risk = 0.05,
  outbreak_start_date = as.Date("2023-01-01"),
  anonymise = FALSE,
  outbreak_size = c(10, 10000),
  population_age = c(1, 90),
  case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5),
  config = create_config()
)

Arguments

contact_distribution

A function or an ⁠<epiparameter>⁠ object to generate the number of contacts per infection.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a probability mass function internally.

The default is an anonymous function with a Poisson probability mass function (dpois()) with a mean (\lambda) of 2 contacts per infection.

infectious_period

An ⁠<epiparameter>⁠ can be provided. This will be converted into random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2 and sdlog = 0.5.

prob_infection

A single numeric for the probability of a secondary contact being infected by an infected primary contact.

onset_to_hosp

A function or an ⁠<epiparameter>⁠ object for the onset-to-hospitalisation delay distribution. onset_to_hosp can also be set to NULL to not simulate hospitalisation (admission) dates.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-hospitalisation delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 1.5 and sdlog = 0.5.

If onset_to_hosp is set to NULL then hosp_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_death

A function or an ⁠<epiparameter>⁠ object for the onset-to-death delay distribution. onset_to_death can also be set to NULL to not simulate dates for individuals that died.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-death delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2.5 and sdlog = 0.5.

If onset_to_death is set to NULL then non_hosp_death_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_recovery

A function or an ⁠<epiparameter>⁠ object for the onset-to-recovery delay distribution. onset_to_recovery can also be NULL to not simulate dates for individuals that recovered.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-recovery delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is NULL so by default cases that recover get an NA in the ⁠$date_outcome⁠ line list column.

reporting_delay

The function can be defined or anonymous. The function must return a vector of numerics for the length of the reporting delay. The function must have a single argument.

The default is NULL so by default there is no reporting delay, and the ⁠$date_reporting⁠ line list column is identical to the ⁠$date_onset⁠ column.

hosp_risk

hosp_death_risk

non_hosp_death_risk

outbreak_start_date

A date for the start of the outbreak.

anonymise

A logical boolean for whether case names should be anonymised. Default is FALSE.

outbreak_size

population_age

case_type_probs

A named numeric vector with the probability of each case type. The names of the vector must be "suspected", "probable", "confirmed". Values of each case type must sum to one.

config

A list of settings to adjust the randomly sampled delays and Ct values. See create_config() for more information.

Details

For age-stratified hospitalised and death risks a ⁠<data.frame>⁠ will need to be passed to the hosp_risk and/or hosp_death_risk arguments. This ⁠<data.frame>⁠ should have two columns:

age_limit: a column with one numeric per cell for the lower bound (minimum) age of the age group (inclusive).
risk: a column with one numeric per cell for the proportion (or probability) of hospitalisation for that age group. Should be between 0 and 1.

For an age-structured population, a ⁠<data.frame>⁠ with two columns:

age_limit: a column with one numeric per cell for the lower bound (minimum) age of the age group (inclusive), except the last element which is the upper bound (maximum) of the population.
proportion: a column with the proportion of the population that are in that age group. Proportions must sum to one.

Value

A line list ⁠<data.frame>⁠

The structure of the output is:

case_name: character column with name of case.
case_type: character column with type of case. By default it is either "confirmed", "probable", or "suspected".
sex: character column with either "m" or "f" for the sex of the case.
age: integer with age of case.
date_onset: ⁠<Date>⁠ column for date of symptom onset.
date_reporting: ⁠<Date>⁠ column for the date of reporting (i.e. entry into line list).
date_admission: ⁠<Date>⁠ column for date of hospital admission.
outcome: character column with the outcome status of each case. Either "recovered" or "died".
date_outcome: ⁠<Date>⁠ column for the date of outcome.
date_first_contact: ⁠<Date>⁠ column for the first contact between infector and infectee (case).
date_last_contact: ⁠<Date>⁠ column for the last contact between infector and infectee (case).
ct_value: numeric column with the Cycle threshold (Ct) value from qPCR for confirmed cases.

Author(s)

Joshua W. Lambert, Carmen Tamayo

Examples

# quickly simulate a line list using the function defaults
linelist <- sim_linelist()
head(linelist)

# to simulate a more realistic line list load epiparameters from
# {epiparameter}
library(epiparameter)
contact_distribution <- epiparameter(
  disease = "COVID-19",
  epi_name = "contact distribution",
  prob_distribution = create_prob_distribution(
    prob_distribution = "pois",
    prob_distribution_params = c(mean = 2)
  )
)

infectious_period <- epiparameter(
  disease = "COVID-19",
  epi_name = "infectious period",
  prob_distribution = create_prob_distribution(
    prob_distribution = "gamma",
    prob_distribution_params = c(shape = 1, scale = 1)
  )
)

onset_to_hosp <- epiparameter(
  disease = "COVID-19",
  epi_name = "onset to hospitalisation",
  prob_distribution = create_prob_distribution(
    prob_distribution = "lnorm",
    prob_distribution_params = c(meanlog = 1, sdlog = 0.5)
  )
)

# get onset to death from {epiparameter} database
onset_to_death <- epiparameter_db(
  disease = "COVID-19",
  epi_name = "onset to death",
  single_epiparameter = TRUE
)
# example with single hospitalisation risk for entire population
linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = 0.5
)
head(linelist)

# example with age-stratified hospitalisation risk
# 20% for over 80s
# 10% for under 5s
# 5% for the rest
age_dep_hosp_risk <- data.frame(
  age_limit = c(1, 5, 80),
  risk = c(0.1, 0.05, 0.2)
)
linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  hosp_risk = age_dep_hosp_risk
)
head(linelist)

Simulate a line list and a contacts table

Description

The line list and contacts are simulated using a branching process and parameterised with epidemiological parameters.

Usage

sim_outbreak(
  contact_distribution = function(x) stats::dpois(x = x, lambda = 2),
  infectious_period = function(x) stats::rlnorm(n = x, meanlog = 2, sdlog = 0.5),
  prob_infection = 0.5,
  onset_to_hosp = function(x) stats::rlnorm(n = x, meanlog = 1.5, sdlog = 0.5),
  onset_to_death = function(x) stats::rlnorm(n = x, meanlog = 2.5, sdlog = 0.5),
  onset_to_recovery = NULL,
  reporting_delay = NULL,
  hosp_risk = 0.2,
  hosp_death_risk = 0.5,
  non_hosp_death_risk = 0.05,
  outbreak_start_date = as.Date("2023-01-01"),
  anonymise = FALSE,
  outbreak_size = c(10, 10000),
  population_age = c(1, 90),
  case_type_probs = c(suspected = 0.2, probable = 0.3, confirmed = 0.5),
  contact_tracing_status_probs = c(under_followup = 0.7, lost_to_followup = 0.2, unknown
    = 0.1),
  config = create_config()
)

Arguments

contact_distribution

A function or an ⁠<epiparameter>⁠ object to generate the number of contacts per infection.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a probability mass function internally.

The default is an anonymous function with a Poisson probability mass function (dpois()) with a mean (\lambda) of 2 contacts per infection.

infectious_period

An ⁠<epiparameter>⁠ can be provided. This will be converted into random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2 and sdlog = 0.5.

prob_infection

A single numeric for the probability of a secondary contact being infected by an infected primary contact.

onset_to_hosp

A function or an ⁠<epiparameter>⁠ object for the onset-to-hospitalisation delay distribution. onset_to_hosp can also be set to NULL to not simulate hospitalisation (admission) dates.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-hospitalisation delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 1.5 and sdlog = 0.5.

If onset_to_hosp is set to NULL then hosp_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_death

A function or an ⁠<epiparameter>⁠ object for the onset-to-death delay distribution. onset_to_death can also be set to NULL to not simulate dates for individuals that died.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-death delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is an anonymous function with a lognormal distribution random number generator (rlnorm()) with meanlog = 2.5 and sdlog = 0.5.

If onset_to_death is set to NULL then non_hosp_death_risk and hosp_death_risk will be automatically set to NULL if not manually specified.

onset_to_recovery

A function or an ⁠<epiparameter>⁠ object for the onset-to-recovery delay distribution. onset_to_recovery can also be NULL to not simulate dates for individuals that recovered.

The function can be defined or anonymous. The function must return a vector of numerics for the length of the onset-to-recovery delay. The function must have a single argument.

An ⁠<epiparameter>⁠ can be provided. This will be converted into a random number generator internally.

The default is NULL so by default cases that recover get an NA in the ⁠$date_outcome⁠ line list column.

reporting_delay

The function can be defined or anonymous. The function must return a vector of numerics for the length of the reporting delay. The function must have a single argument.

The default is NULL so by default there is no reporting delay, and the ⁠$date_reporting⁠ line list column is identical to the ⁠$date_onset⁠ column.

hosp_risk

hosp_death_risk

non_hosp_death_risk

outbreak_start_date

A date for the start of the outbreak.

anonymise

A logical boolean for whether case names should be anonymised. Default is FALSE.

outbreak_size

population_age

case_type_probs

A named numeric vector with the probability of each case type. The names of the vector must be "suspected", "probable", "confirmed". Values of each case type must sum to one.

contact_tracing_status_probs

config

A list of settings to adjust the randomly sampled delays and Ct values. See create_config() for more information.

Details

For age-stratified hospitalised and death risks a ⁠<data.frame>⁠ will need to be passed to the hosp_risk and/or hosp_death_risk arguments. This ⁠<data.frame>⁠ should have two columns:

age_limit: a column with one numeric per cell for the lower bound (minimum) age of the age group (inclusive).
risk: a column with one numeric per cell for the proportion (or probability) of hospitalisation for that age group. Should be between 0 and 1.

For an age-structured population, a ⁠<data.frame>⁠ with two columns:

age_limit: a column with one numeric per cell for the lower bound (minimum) age of the age group (inclusive), except the last element which is the upper bound (maximum) of the population.
proportion: a column with the proportion of the population that are in that age group. Proportions must sum to one.

Value

A list with two elements:

A line list ⁠<data.frame>⁠ (see sim_linelist() for ⁠<data.frame>⁠ structure)
A contacts ⁠<data.frame>⁠ (see sim_contacts() for ⁠<data.frame>⁠ structure)

Author(s)

Joshua W. Lambert

Examples

# quickly simulate an outbreak using the function defaults
outbreak <- sim_outbreak()
head(outbreak$linelist)
head(outbreak$contacts)

# to simulate a more realistic outbreak load epiparameters from
# {epiparameter}
library(epiparameter)
contact_distribution <- epiparameter(
  disease = "COVID-19",
  epi_name = "contact distribution",
  prob_distribution = create_prob_distribution(
    prob_distribution = "pois",
    prob_distribution_params = c(mean = 2)
  )
)

infectious_period <- epiparameter(
  disease = "COVID-19",
  epi_name = "infectious period",
  prob_distribution = create_prob_distribution(
    prob_distribution = "gamma",
    prob_distribution_params = c(shape = 1, scale = 1)
  )
)

onset_to_hosp <- epiparameter(
  disease = "COVID-19",
  epi_name = "onset to hospitalisation",
  prob_distribution = create_prob_distribution(
    prob_distribution = "lnorm",
    prob_distribution_params = c(meanlog = 1, sdlog = 0.5)
  )
)

# get onset to death from {epiparameter} database
onset_to_death <- epiparameter_db(
  disease = "COVID-19",
  epi_name = "onset to death",
  single_epiparameter = TRUE
)

outbreak <- sim_outbreak(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death
)

Adjust or subset a line list to account for right truncation

Description

Adjust or subset the line list ⁠<data.frame>⁠ by removing cases that have not been reported by the truncation time and setting hospitalisation admission or outcome dates that are after the truncation point to NA.

This is to replicate real-time outbreak data where recent cases or outcomes are not yet observed or reported (right truncation). It implies an assumption that symptom onsets are reported with a delay but hospitalisations are reported instantly.

Usage

truncate_linelist(
  linelist,
  truncation_day = 14,
  unit = c("days", "weeks", "months", "years"),
  direction = c("backwards", "forwards")
)

Arguments

linelist

Line list ⁠<data.frame>⁠ output from sim_linelist().

truncation_day

A single numeric specifying the number of days (default), weeks, months or years before the end of the outbreak (default) or since the start of the outbreak (see direction argument) to truncate the line list at. By default it is 14 days before the end of the outbreak.

Alternatively, truncation_day can accept a ⁠<Date>⁠ and this is used as the truncation_day and the unit and direction is ignored.

unit

A character string, either "days" (default), "weeks", "months", or "years", specifying the units of the truncation_day argument.

Years are assumed to be 365.25 days and months are assumed to be 365.25 / 12 days (same as lubridate).

direction

A character string, either "backwards" (default) or "forwards". direction = backwards defines the truncation_day as the time before the end of the outbreak. direction = forwards defines the truncation_day as the time since the start of the outbreak.

Details

The day on which the line list is truncated is the same for all individuals in the line list, and is specified by the truncation_day and unit arguments.

Value

A line list ⁠<data.frame>⁠.

The output ⁠<data.frame>⁠ has the same structure as the input ⁠<data.frame>⁠ from sim_linelist(), but can be a subset and dates after truncation set to NA.

Examples

set.seed(1)
linelist <- sim_linelist()
linelist_trunc <- truncate_linelist(linelist)

# set truncation point 3 weeks before the end of outbreak
linelist_trunc <- truncate_linelist(
  linelist,
  truncation_day = 3,
  unit = "weeks"
)

# set truncation point to 2 months since the start of outbreak
linelist_trunc <- truncate_linelist(
  linelist,
  truncation_day = 2,
  unit = "months",
  direction = "forwards"
)

# set truncation point to 2023-03-01
linelist_trunc <- truncate_linelist(
  linelist,
  truncation_day = as.Date("2023-03-01")
)

simulist: Simulate Disease Outbreak Line List and Contacts Data

Description

Author(s)

See Also

Add line list event dates and case information as columns to infectious history ⁠<data.frame>⁠

Description

Usage

Arguments

Value

Introduce user-specified proportion of custom missing values into a ⁠<data.frame>⁠

Description

Usage

Arguments

Value

Anonymise names

Description

Usage

Arguments

Value

Check if ⁠<data.frame>⁠ defining either age-stratified hospitalisation or death risk, or defining age structure of population is correct

Description

Usage

Arguments

Value

Check if R object is line list from sim_linelist()

Description

Usage

Arguments

Details

Value

Check if arguments input to simulation function are valid

Description

Usage

Arguments

Details

Value

Cross check the onset-to-hospitalisation or -death arguments are compatible with hospitalisation and death risks

Description

Usage

Arguments

Value

Sample names using randomNames::randomNames()

Description

Usage

Arguments

Value

Sample the onset-to-outcome time conditional that the outcome is after a hospitalisation event

Description

Usage

Arguments

Value

Internal simulation function called by the exported simulation functions within simulist

Description

Usage

Arguments

Value

Simulate a random network branching process model with a probability of infection for each contact

Description

Usage

Arguments

Details

Value

Convert ⁠<epiparameter>⁠ or NULL to function

Description

Usage

Arguments

Value

Censor dates in line list

Description

Usage

Arguments

Details

Value

Examples

Coerce and store ⁠<data.frame>⁠ subclass to ⁠<data.frame>⁠ and restore ⁠<data.frame>⁠ subclass to ⁠<data.frame>⁠ from attribute.

Description

Usage

Arguments

Value

Create a list of configuration settings for some details of sim_linelist()

Add line list event dates and case information as columns to infectious history `⁠<data.frame>⁠`

Introduce user-specified proportion of custom missing values into a `⁠<data.frame>⁠`

Check if `⁠<data.frame>⁠` defining either age-stratified hospitalisation or death risk, or defining age structure of population is correct

Check if R object is line list from `sim_linelist()`

Sample names using `randomNames::randomNames()`

Convert `⁠<epiparameter>⁠` or `NULL` to function

Coerce and store `⁠<data.frame>⁠` subclass to `⁠<data.frame>⁠` and restore `⁠<data.frame>⁠` subclass to `⁠<data.frame>⁠` from attribute.

Create a list of configuration settings for some details of `sim_linelist()`