| Title: | Rapid Easy Synthesis to Inform Data Extraction | 
| Version: | 0.3.2 | 
| Description: | Developed to assist researchers with planning analysis, prior to obtaining data from Trusted Research Environments (TREs) also known as safe havens. With functionality to export and import marginal distributions as well as synthesise data, both with and without correlations from these marginal distributions. Using a multivariate cumulative distribution (COPULA). Additionally the International Stroke Trial (IST) is included as an example dataset under ODC-By licence Sandercock et al. (2011) <doi:10.7488/ds/104>, Sandercock et al. (2011) <doi:10.1186/1745-6215-12-101>. | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| VignetteBuilder: | knitr | 
| Suggests: | testthat (≥ 3.0.0), lifecycle, knitr, rmarkdown, DT | 
| Depends: | R (≥ 2.10) | 
| Imports: | dplyr, magrittr, bestNormalize, RDP, methods, tibble, simstudy, matrixcalc | 
| LazyData: | true | 
| Config/testthat/edition: | 3 | 
| URL: | https://hehta.github.io/RESIDE/ | 
| NeedsCompilation: | no | 
| Packaged: | 2024-10-16 22:11:11 UTC; ryan | 
| Author: | Ryan Field  | 
| Maintainer: | Ryan Field <ryan.field@glasgow.ac.uk> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-10-17 17:10:09 UTC | 
RESIDE: Rapid Easy Synthesis to Inform Data Extraction
Description
Developed to assist researchers with planning analysis, prior to obtaining data from Trusted Research Environments (TREs) also known as safe havens. With functionality to export and import marginal distributions as well as synthesise data, both with and without correlations from these marginal distributions. Using a multivariate cumulative distribution (COPULA). Additionally the International Stroke Trial (IST) is included as an example dataset under ODC-By licence Sandercock et al. (2011) doi: 10.7488/ds/104, Sandercock et al. (2011) doi: 10.1186/1745-6215-12-101.
Details
The RESIDE Package
This work was supported by the UKRI Strength in Places Fund (SIPF) Competition, #' project number 107140. The project title is SIPF The Living Laboratory driving economic growth in Glasgow through real world implementation of precision medicine.
Author(s)
Maintainer: Ryan Field ryan.field@glasgow.ac.uk (ORCID)
Authors:
David McAllister david.mcallister@glasgow.ac.uk (ORCID)
Other contributors:
Claudia Geue cladia.geue@glasgow.ac.uk (ORCID) [contributor]
See Also
Useful links:
IST Dataset
Description
The International Stroke Trial Dataset
Usage
IST
Format
A data frame with 19435 rows and 112 columns:
- AGE
 Randomisation data: Age in years
- CMPLASP
 Other data and derived variables: Compliant for aspirin
- CMPLHEP
 Other data and derived variables: Compliant for heparin
- CNTRYNUM
 Other data and derived variables: Country code
- COUNTRY
 Other data and derived variables: Abbreviated country code
- DALIVE
 Recurrent stroke within 14 days: Discharged alive from hospital
- DALIVED
 Recurrent stroke within 14 days: Date Discharged alive from hospital
- DAP
 Data collected on 14 day/discharge form about treatments given in hospital: Non trial antiplatelet drug (Y/N)
- DASP14
 Data collected on 14 day/discharge form about treatments given in hospital: Aspirin given for 14 days or till death or discharge (Y/N)
- DASPLT
 Data collected on 14 day/discharge form about treatments given in hospital: Discharged on long term aspirin (Y/N)
- DAYLOCAL
 Randomisation data: Estimate of local day of week (assuming RDATE is Oxford)
- DCAA
 Data collected on 14 day/discharge form about treatments given in hospital: Calcium antagonists (Y/N)
- DCAREND
 Data collected on 14 day/discharge form about treatments given in hospital: Carotid surgery (Y/N)
- DDEAD
 Other events within 14 days: Dead on discharge form
- DDEADC
 Other events within 14 days: Cause of death (1-Initial stroke/2-Recurrent stroke (ischaemic or unknown /3-Recurrent stroke (haemorrhagic)/4-Pneumonia /5-Coronary heart disease/6-Pulmonary embolism /7-Other vascular or unknown/8-Non-vascular/0-unknown)
- DDEADD
 Date of dead on discharge form (yyyy/mm/dd); NOTE: this death is not necessarily within 14 days of randomisation
- DDEADX
 Other events within 14 days: Comment on death
- DDIAGHA
 Final diagnosis of initial event: Haemorrhagic stroke
- DDIAGISC
 Final diagnosis of initial event: Ischaemic stroke
- DDIAGUN
 Final diagnosis of initial event: Indeterminate stroke
- DEAD1
 Indicator variables for specific causes of death: Initial stroke
- DEAD2
 Indicator variables for specific causes of death: Reccurent ischaemic/unknown stroke
- DEAD3
 Indicator variables for specific causes of death: Reccurent haemorrhagic stroke
- DEAD4
 Indicator variables for specific causes of death: Pneumonia
- DEAD5
 Indicator variables for specific causes of death: Coronary heart disease
- DEAD6
 Indicator variables for specific causes of death: Pulmonary embolism
- DEAD7
 Indicator variables for specific causes of death: Other vascular or unknown
- DEAD8
 Indicator variables for specific causes of death: Non vascular
- DGORM
 Data collected on 14 day/discharge form about treatments given in hospital: Glycerol or manitol (Y/N)
- DHAEMD
 Data collected on 14 day/discharge form about treatments given in hospital: Haemodilution (Y/N)
- DHH14
 Data collected on 14 day/discharge form about treatments given in hospital: Medium dose heparin given for 14 days etc in pilot (combine with above)
- DIED
 Other data and derived variables: Indicator variable for death (1=died; 0=did not die)
- DIVH
 Data collected on 14 day/discharge form about treatments given in hospital: Non trial intravenous heparin (Y/N)
- DLH14
 Data collected on 14 day/discharge form about treatments given in hospital: Low dose heparin given for 14 days or till death/discharge (Y/N)
- DMAJNCH
 Data collected on 14 day/discharge form about treatments given in hospital: Major non-cerebral haemorrhage (Y/N)
- DMAJNCHD
 Data collected on 14 day/discharge form about treatments given in hospital: Date of Major non-cerebral haemorrhage (yyyy/mm/dd)
- DMAJNCHX
 Data collected on 14 day/discharge form about treatments given in hospital: Comment of Major non-cerebral haemorrhage
- DMH14
 Data collected on 14 day/discharge form about treatments given in hospital: Date of Major non-cerebral haemorrhage (yyyy/mm/dd)
- DNOSTRK
 Final diagnosis of initial event: Not a stroke
- DNOSTRKX
 Final diagnosis of initial event: Comment on Not a stroke
- DOAC
 Data collected on 14 day/discharge form about treatments given in hospital: Other anticoagulants (Y/N)
- DPE
 Other events within 14 days: Pulmonary embolism
- DPED
 Other events within 14 days: Date of Pulmonary embolism (yyyy/mm/dd)
- DPLACE
 Other events within 14 days: Discharge destination (A-Home /B-Relatives home /C-Residential care /D-Nursing home /E-Other hospital departments /U-Unknown)
- DRSH
 Recurrent stroke within 14 days: Haemorrhagic stroke
- DRSHD
 Recurrent stroke within 14 days: Date of Haemorrhagic stroke (yyyy/mm/dd)
- DRSISC
 Recurrent stroke within 14 days: Ischaemic recurrent stroke
- DRSISCD
 Recurrent stroke within 14 days: Date of Ischaemic recurrent stroke (yyyy/mm/dd)
- DRSUNK
 Recurrent stroke within 14 days: Unknown type
- DRSUNKD
 Recurrent stroke within 14 days: Date of Unknown type (yyyy/mm/dd)
- DSCH
 Data collected on 14 day/discharge form about treatments given in hospital: Non trial subcutaneous heparin (Y/N)
- DSIDE
 Data collected on 14 day/discharge form about treatments given in hospital: Other side effect (Y/N)
- DSIDED
 Data collected on 14 day/discharge form about treatments given in hospital: Date of Other side effect
- DSIDEX
 Data collected on 14 day/discharge form about treatments given in hospital: Comment of Other side effect
- DSTER
 Data collected on 14 day/discharge form about treatments given in hospital: Steroids (Y/N)
- DTHROMB
 Data collected on 14 day/discharge form about treatments given in hospital: Thrombolysis (Y/N)
- DVT14
 Indicator variables for specific causes of death: Indicator of deep vein thrombosis on discharge form
- EXPD14
 Other data and derived variables: Predicted probability of death at 14 days
- EXPD6
 Other data and derived variables: Predicted probability of death at 6 month
- EXPDD
 Other data and derived variables: Predicted probability of death/dependence at 6 month
- FAP
 Data collected at 6 months: On antiplatelet drugs
- FDEAD
 Data collected at 6 months: Dead at six month follow-up (Y/N)
- FDEADC
 Data collected at 6 months: Cause of death (1-Initial stroke /2-Recurrent stroke (ischaemic or unknown) /3-Recurrent stroke (haemorrhagic) /4-Pneumonia /5-Coronary heart disease /6-Pulmonary embolism /7-Other vascular or unknown /8-Non-vascular /0-unknown)
- FDEADD
 Data collected at 6 months: Date of death; NOTE: this death is not necessarily within 6 months of randomisation
- FDEADX
 Data collected at 6 months: Comment on death
- FDENNIS
 Data collected at 6 months: Dependent at 6 month follow-up (Y/N)
- FLASTD
 Data collected at 6 months: Date of last contact
- FOAC
 Data collected at 6 months: On anticoagulants
- FPLACE
 Data collected at 6 months: Place of residance at 6 month follow-up ( A-Home /B-Relatives home /C-Residential care /D-Nursing home /E-Other hospital departments /U-Unknown)
- FRECOVER
 Data collected at 6 months: Fully recovered at 6 month follow-up (Y/N)
- FU1_COMP
 Other data and derived variables: Date discharge form completed
- FU1_RECD
 Other data and derived variables: Date discharge form received
- FU2_DONE
 Other data and derived variables: Date 6 month follow-up done
- H14
 Indicator variables for specific causes of death: Cerebral bleed/heamorrhagic stroke within 14 days; this is slightly wider definition than DRSH an is used for analysis of cerebral bleeds
- HOSPNUM
 Randomisation data: Hospital number
- HOURLOCAL
 Randomisation data: Local time – hours
- HTI14
 Indicator variables for specific causes of death: Indicator of haemorrhagic transformation within 14 days
- ID14
 Other data and derived variables: Indicator of death at 14 days
- ISC14
 Indicator variables for specific causes of death: Indicator of ischaemic stroke within 14 days
- MINLOCAL
 Randomisation data: Local time – minutes
- NCB14
 Indicator variables for specific causes of death: Indicator of any non-cerebral bleed within 14 days
- NCCODE
 Other data and derived variables: Coding of compliance (see Table 3) doi: 10.1186/1745-6215-13-24
- NK14
 Indicator variables for specific causes of death: Indicator of indeterminate stroke within 14 days
- OCCODE
 Other data and derived variables: Six month outcome ( 1-dead /2-dependent /3-not recovered /4-recovered /8 or 9 – missing status
- ONDRUG
 Data collected on 14 day/discharge form about treatments given in hospital: Estimate of time in days on trial treatment
- PE14
 Indicator variables for specific causes of death: Indicator of pulmonary embolism within 14 days
- RASP3
 Randomisation data: Aspirin within 3 days prior to randomisation (Y/N)
- RATRIAL
 Randomisation data: Atrial fibrillation (Y/N); not coded for pilot phase - 984 patients
- RCONSC
 Randomisation data: Conscious state at randomisation (F - fully alert, D - drowsy, U - unconscious)
- RCT
 Randomisation data: CT before randomisation (Y/N)
- RDATE
 Randomisation data: Date of randomisation
- RDEF1
 Randomisation data: Face deficit (Y/N/C=can't assess)
- RDEF2
 Randomisation data: Arm/hand deficit (Y/N/C=can't assess)
- RDEF3
 Randomisation data: Leg/foot deficit (Y/N/C=can't assess)
- RDEF4
 Randomisation data: Dysphasia (Y/N/C=can't assess)
- RDEF5
 Randomisation data: Hemianopia (Y/N/C=can't assess)
- RDEF6
 Randomisation data: Visuospatial disorder (Y/N/C=can't assess)
- RDEF7
 Randomisation data: Brainstem/cerebellar signs (Y/N/C=can't assess)
- RDEF8
 Randomisation data: Other deficit (Y/N/C=can't assess)
- RDELAY
 Randomisation data: Delay between stroke and randomisation in hours
- RHEP24
 Randomisation data: Heparin within 24 hours prior to randomisation (Y/N)
- RSBP
 Randomisation data: Systolic blood pressure at randomisation (mmHg)
- RSLEEP
 Randomisation data: Symptoms noted on waking (Y/N)
- RVISINF
 Randomisation data: Infarct visible on CT (Y/N)
- RXASP
 Randomisation data: Trial aspirin allocated (Y/N)
- RXHEP
 Randomisation data: Trial heparin allocated (M/L/N) \[M is coded as H=high in pilot\]
- SET14D
 Other data and derived variables: Know to be dead or alive at 14 days (1=Yes, 0=No); this does not necessarily mean that we know outcome at 6 monts – see OCCODE for this
- SEX
 Randomisation data: M=male; F=female
- STRK14
 Indicator variables for specific causes of death: Indicator of any stroke within 14 days
- STYPE
 Randomisation data: Stroke subtype (TACS/PACS/POCS/LACS/other)
- TD
 Other data and derived variables: Time of death or censoring in days
- TRAN14
 Indicator variables for specific causes of death: Indicator of major non-cerebral bleed within 14 days
...
Details
Obtained from Sandercock, Peter; Niewada, Maciej; Czlonkowska, Anna. (2011). International Stroke Trial database (version 2), [dataset]. University of Edinburgh. Department of Clinical Neurosciences. doi: 10.7488/ds/104 Under ODC-by licence
Author(s)
Sandercock P et al. Peter.Sandercock@ed.ac.uk
References
doi: 10.7488/ds/104
Export an empty correlation matrix
Description
A function to export a correlation matrix with the required variables as a csv file.
Usage
export_empty_cor_matrix(
  marginals,
  folder_path,
  file_name = "correlation_matrix.csv",
  create_folder = TRUE
)
Arguments
marginals | 
 The marginal distributions  | 
folder_path | 
 Folder to export to.  | 
file_name | 
 (optional) file name, Default: 'correlation_matrix.csv'  | 
create_folder | 
 Whether the folder should be created, Default: TRUE  | 
Details
This function will export an empty correlation matrix
as a csv file, it will contain all the necessary variables including
dummy variables for factors. Dummy variables for factors may contain
a missing category to represent missing data. Correlations should be
added to the empty CSV and the imported using the
import_marginal_distributions function.
Correlations should be supplied using rank order correlations.
The correlation matrix should be symmetric and positive semi definite.
Value
No return value, called for exportation of files.
See Also
import_marginal_distributions
import_cor_matrix
Examples
## Not run: 
 marginals <- import_marginal_distributions()
 export_empty_cor_matrix(
   marginals,
   folder_path = tempdir()
  )
## End(Not run)
Export Marginal Distributions
Description
Export the marginal distributions to CSV files
Usage
export_marginal_distributions(
  marginals,
  folder_path,
  create_folder = FALSE,
  force = FALSE
)
Arguments
marginals | 
 an Object of type RESIDE from
  | 
folder_path | 
 path to folder where to save files.  | 
create_folder | 
 if the folder does not exist should it be created, Default: FALSE  | 
force | 
 if the folder already contains marginal distribution files should they be removed, Default: FALSE  | 
Details
Exports each of the marginal distributions to CSV files within a given folder, along with the continuous quantiles.
Value
No return value, called for exportation of files.
See Also
Examples
  marginal_distributions <- get_marginal_distributions(IST)
  export_marginal_distributions(
    marginal_distributions,
    folder_path = tempdir()
  )
Generate Marginal Distributions for a given data frame
Description
Generate Marginal Distributions from a given data frame with options to specify which variables to use.
Usage
get_marginal_distributions(df, variables = c(), print = FALSE)
Arguments
df | 
 Data frame to get the marginal distributions from  | 
variables | 
 (Optional) variable (columns) to select, Default: c()  | 
print | 
 Whether to print the marginal distributions to the console, Default: FALSE  | 
Details
A function to generate marginal distributions from a given data frame, depending on the variable type the marginals will differ, for binary variables a mean and number of missing is generated for continuous variables, they are first transformed and both mean and sd of the transformed variables are stored along with the quantile mapping for back transformation. For categorical variables, the number of each category is stored, missing values are categorise as "missing".
Value
A list of marginal distributions of an S3 RESIDE Class
See Also
Examples
marginal_distributions <- get_marginal_distributions(
  IST,
  variables <- c(
    "SEX",
    "AGE",
    "ID14",
    "RSBP",
    "RATRIAL"
  )
)
Import a correlation matrix
Description
Imports a correlation matrix from a csv file generated by
export_empty_cor_matrix
Usage
import_cor_matrix(file_path = "./correlation_matrix.csv")
Arguments
file_path | 
 A path to the csv file, Default: './correlation_matrix.csv'  | 
Details
A function to import the user specified correlations
generated from the csv file exported by the
export_empty_cor_matrix function.
Correlations should be entered into the CSV file,
using rank order correlations. The correlation matrix
should be symmetric and be positive semi definite.
Value
a matrix of correlations that can be used with
synthesise_data
See Also
export_empty_cor_matrix
is.positive.semi.definite
Examples
## Not run: 
  import_cor_matrix("correlation_matrix.csv")
## End(Not run)
Import Marginal Distributions
Description
Import the marginal distribution as exported from a Trusted Research Environment (TRE)
Usage
import_marginal_distributions(
  folder_path = ".",
  binary_variables_file = "",
  categorical_variables_file = "",
  continuous_variables_file = "",
  continuous_quantiles_file = "",
  summary_file = "summary.csv"
)
Arguments
folder_path | 
 Where the marginal distribution files are located, Default: '.' see details.  | 
binary_variables_file | 
 filename for the binary_variables file, Default: ” see details.  | 
categorical_variables_file | 
 filename for the categorical variables file , Default: ” see details.  | 
continuous_variables_file | 
 filename for the continuous variables file, Default: ” see details.  | 
continuous_quantiles_file | 
 filename for the continuous quantiles file, Default: ” see details.  | 
summary_file | 
 filename for the summary file, Default: 'summary.csv' see details.  | 
Details
This function will import marginal distributions as generated
within a Trusted Research Environment (TRE) using the function
export_marginal_distributions.
The folder_path allows the path of the files
provided by the TRE to be imported,
this will default to the current working directory.
The file parameters will provide the default file names
if no filenames are specified.
Value
Returns an object of a RESIDE class
See Also
Examples
## Not run: 
  marginals <- import_marginal_distributions()
## End(Not run)
print.RESIDE
Description
S3 override for print RESIDE
Usage
## S3 method for class 'RESIDE'
print(x, ...)
Arguments
x | 
 an object of class RESIDE  | 
... | 
 Other parameters currently none are used  | 
Details
S3 Override for RESIDE Class
Value
No return value, called to print to the terminal.
Examples
print(
  marginal_distributions <- get_marginal_distributions(
    IST,
    variables <- c(
      "SEX",
      "AGE",
      "ID14",
      "RSBP",
      "RATRIAL"
    )
  )
)
Synthesise data from marginal distributions
Description
Allows the synthesis of data from marginal distributions obtained from a Trusted Research Environment (TRE)
Usage
synthesise_data(marginals, correlation_matrix = NULL, ...)
synthesize_data(marginals, correlation_matrix = NULL, ...)
Arguments
marginals | 
 an object of class RESIDE  | 
correlation_matrix | 
 Correlation Matrix
see   | 
... | 
 Additional parameters currently none are used.  | 
Details
This function will synthesise a dataset from marginals imported
using import_marginal_distributions.
By default the dataset will not contain correlations,
however user specified correlations can be added using
the correlation_matrix parameter,
see export_empty_cor_matrix and
import_cor_matrix for more details.
Value
a data frame of simulated data
See Also
export_empty_cor_matrix
import_cor_matrix
Examples
## Not run: 
   marginals <- import_marginal_distributions()
   df <- synthesise_data(marginals)
## End(Not run)