--- title: "Expanding omopgenerics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{expanding_omopgenerics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction **omopgenerics** defines an `ecosystem` of methods and classes particularly the ** class that can be expanded. Currently there are two packages that define cdm sources: - [omopgenerics](https://darwin-eu.github.io/omopgenerics/) defines the ** source that defines the implementation of a 'local' (in memory) cdm source. - [CDMConnector](https://darwin-eu.github.io/CDMConnector/) defines the ** source that defines a general implementation for *DBI* connections. In this vignette we explain how to expand the **omopgenerics** ecosystem defining more sources. ## The source object First we need to define a function to create our source object: the source object must be an object (usually a list) that contains several attributes that will be used in the methods to fulfill their purpose. Finally we have to assign a class to our source and validate it with `omopgenerics::newCdmSource()`. The function has an argument to assign a `sourceType` that must be a character vector that identifies the name of the source. This is what will be retrieved by the `omopgenerics::sourceType()` function and it will be useful to identify how the source of the cdm_reference has been created. Example how the creation of a new source would look like: ```{r, eval = FALSE} myCustomSource <- function(argument1, argument2, ...) { # pre calculation and validation of arguments ... # create the source object obj <- list(x = x, y = y, ...) # this way you would access the attributes like: obj$x # or obj <- structure(.Data = list(), x = x, y = y, ...) # this you would access the attributes like: attr(obj, "x") # assign class class(obj) <- "my_custom_source" # validation omopgenerics::newCdmSource(src = obj, sourceType = "my_custom_type") } ``` If the first function that we create is `myCustomSource()` the validation with `omopgenerics::newCdmSource()` will fail as inside the *methods* are checked to be defined and work properly. ## Methods You will need to write 4 to 7 methods for your new ``: - `insertTable` **required** To insert local data into your source. - `compute` **required** To compute a 'query' into a table in your source. - `listSourceTables` **required** To list the data present into your source. - `dropSourceTable` **required** To drop a table from your source. - `readSourceTable` **recommended** To read a table from your source. - `insertCdmTo` **recommended** To insert a cdm into your source. - `summary` **recommended** To summarise and report the properties of your source. ### `insertTable` **Purpose**: To insert a local table into your source object. **Arguments**: - `cdm` The cdm argument will be your source object created with `myCustomSource()` function. - `name` The name to identify that table in your source. - `table` A local `` to insert in your source. - `overwrite` (by default TRUE), whether to overwrite if the table exists in the database. - `temporary` (by default FALSE), whether the table must be temporary. **Output**: The output of a insertTable must be a `cdm_table` so your function must at the end validate it with `omopgenerics::newCdmTable().` Sketch of how the function should look like: ```{r, eval = FALSE} #' @export #' @importFrom omopgenerics insertTable insertTable.my_custom_source <- function(cdm, name, table, overwrite, temporary) { # code to insert the table into your source x <- "...." # it must be a reference to your table # validate output omopgenerics::newCdmTable(table = x, src = cdm, name = name) } ``` ### `listSourceTables` **Purpose**: To list tables that are present in the source. **Arguments**: - `cdm` The cdm argument will be your source object created with `myCustomSource()` function. **Output**: A character vector with the names of tables present in source, empty identifiers `""` will be eliminated by `omopgenerics`. Sketch of how the function should look like: ```{r, eval = FALSE} #' @export #' @importFrom omopgenerics listSourceTables listSourceTables.my_custom_source <- function(cdm) { # code to list the tables present in source (cdm) x <- "...." return(x) } ``` ### `readSourceTable` **Purpose**: To read tables that are present in the source. **Arguments**: - `cdm` The cdm argument will be your source object created with `myCustomSource()` function. - `name` Name to identify the table in your source. **Output**: The output of a readSourceTable must be a `cdm_table` so your function must at the end validate it with `omopgenerics::newCdmTable().` Sketch of how the function should look like: ```{r, eval = FALSE} #' @export #' @importFrom omopgenerics readSourceTable readSourceTable.my_custom_source <- function(cdm, name) { # code to read the table 'name' from source. x <- "...." # validate as cdm_table omopgenerics::newCdmTable(table = x, src = cdm, name = name) } ``` ### `dropSourceTable` **Purpose**: To drop a table from your source. **Arguments**: - `cdm` The cdm argument will be your source object created with `myCustomSource()` function. - `name` Name identifier for the table that you want to drop. **Output**: The output is ignored, would recommend to return the source. Sketch of how the function should look like: ```{r, eval = FALSE} #' @export #' @importFrom omopgenerics dropSourceTable dropSourceTable.my_custom_source <- function(cdm, name) { # code to drop the table `name` present in source (cdm) return(invisible(cdm)) } ``` ### `insertCdmTo` **Purpose**: To insert a cdm to your source. **Arguments**: - `cdm` A cdm reference from a different source. Recommend to collect each table before inserting. - `to` The 'to' argument will be your source object created with `myCustomSource()` function. **Output**: The output should be the cdm reference object inserted in your `to` source. Sketch of how the function should look like: ```{r, eval = FALSE} #' @export #' @importFrom omopgenerics dropSourceTable insertCdmTo.my_custom_source <- function(cdm, to) { # example of how it can look like: tables <- names(cdm) |> rlang::set_names() |> purrr::map(\(x) omopgenerics::insertTable(cdm = to, name = x, table = dplyr::as_tibble(cdm[[x]]))) omopgenerics::newCdmReference( tables = tables, cdmName = omopgenerics::cdmName(x = cdm), cdmVersion = omopgenerics::cdmVersion(x = cdm) ) } ``` The content of the function can vary depending of your source. ### `summary` **Purpose**: To summarise the metadata of your source object. **Arguments**: - `object` The 'object' argument will be your source object created with `myCustomSource()` function. - `...` For consistency. **Output**: A named list of metadata of the source, each element must be a string of length 1. Sketch of how the function should look like: ```{r, eval = FALSE} #' @export summary.my_custom_source <- function(object, ...) { # extract metadata metadata1 <- "..." metadata2 <- "..." metadata3 <- "..." list(metadata1 = metadata1, metadata2 = metadata2, metadata3 = metadata3) } ``` ### `compute` This function works slightly different to the rest the input it will be a query instead of the source object. **Purpose**: To compute a table into a permanent placeholder in your source. **Arguments**: - `x` Query to compute. - `name` The name to identify the resultant table in your source. - `temporary` (by default FALSE), whether the table must be temporary. - `overwrite` (by default TRUE), whether to overwrite if the table exists in the database. - `...` For consistency. **Output**: The output of a compute must be a reference to your table in your source data, it will be converted later to a cdm_table (but you do not have to worry about that). Sketch of how the function should look like: ```{r, eval = FALSE} #' @export #' @importFrom dplyr compute compute.my_custom_source <- function(x, name, overwrite, temporary, ...) { # code to compute the table into your source x <- "...." # it must be a reference to your table return(x) } ``` ## The cdm reference object Finally every `` class object would also need a function to create a ** to do that you just have to read all the tables that you want to include in your cdm object. **tables** must be a list of ** with the same source. ```{r, eval = FALSE} cdmFromMyCustomSource <- function(argument1, argument2, ...) { # read and prepare the cdm tables ... # return the cdm object omopgenerics::newCdmReference( tables = tables, # list of cdm and achilles standard tables cdmName = "...", # usually provided as input, but also you might want to search in the cdm_source cdmVersion = "..." # either "5.3" or "5.4" ) } ``` If you want to add ** to your object do it after the initial cdm creation like: ```{r, eval = FALSE} # read from source cdm <- readSourceTable(cdm = cdm, name = "my_cohort") # or insert from local cdm <- insertTable(cdm = cdm, name = "my_cohort", table = localCohort) cdm$my_cohort <- cdm$my_cohort |> newCohortTable( cohortSetRef = cohort_set, # table with the settings of the cohort_table cohortAttritionRef = cohort_attrition, # table with the attrition of the cohort_table cohortCodelistRef = cohort_codelist # table with the codelists of the cohort_table ) ``` This step can be included in your cdm object creation if you wish: ```{r, eval = FALSE} cdmFromMyCustomSource <- function(argument1, argument2, ..., cohortTables) { # read and prepare the cdm tables ... # return the cdm object cdm <- omopgenerics::newCdmReference( tables = tables, # list of cdm and achilles standard tables cdmName = "...", # usually provided as input, but also you might want to search in the cdm_source cdmVersion = "..." # either "5.3" or "5.4" ) # read cohort tables readSourceTable(cdm = cdm, name = cohortTables) } ```