--- title: "4 - Database Visualizing" format: html: toc: true vignette: > %\VignetteIndexEntry{4 - Database Visualizing} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} knitr: opts_chunk: collapse: true comment: '#>' --- ## Introduction You imported your database, but now you might want to visualize some part of it. There are a lot of ways to do so, so EDCimport provides functions for a few concepts. As in previous vignettes, we will be using `edc_example()`, but in the real world you should use EDC reading functions. See `vignette("reading")` to see how. ```{r} #| warning: false library(EDCimport) library(dplyr) db = edc_example(N=200) %>% edc_unify_subjid() %>% edc_clean_names() db load_database(db) ``` ## Swimmer plot Each patient experiences a series of events during their visits, recorded in Date/Datetime columns across your datasets. A simple and effective method to identify errors and inconsistencies is to create a `swimmerplot` of these columns. This visualization helps to quickly spot incorrect sequences, data entry errors, or unexpected time gaps. For example, you can check that no experimental treatment was administered before enrollment and that the latest recorded date appears in the follow-up dataset. ```{r} edc_swimmerplot(origin="enrol$enrol_date") ``` A convenient way to perform these checks is by using the interactive plot with `plotly=TRUE`. Although it cannot be displayed within a vignette, the output can be saved as a standalone HTML file for easy sharing. ``` r sp = edc_swimmerplot(plotly=TRUE) sp save_plotly(sp, "swimmerplot.html") ``` ## CRF completion plot Using `edc_crf_plot()`, you can generate a barplot showing the distribution of CRF status (Complete, Incomplete, ...) for each dataset of the database. ```{r} edc_crf_plot() ``` ## Patient gridplot Using `edc_patient_gridplot()`, you can visualize which patients are included in each dataset and identify any problematic missing records. ```{r} edc_patient_gridplot() ``` ## Population plot With `edc_population_plot()`, you can visualize the different analysis populations. Here, we use `setdiff()` to exclude patients from the various populations, but in real-world data you should probably use `dplyr::filter()`. ```{r} # Total population: all screened patients pop_total <- c(1:100) %>% setdiff(12) #Software error, SUBJID attributed twice # ITT (Intent-to-Treat): All randomized patients (excluding screening failures only) pop_itt <- pop_total %>% setdiff(55) # mITT (Modified ITT): All treated patients pop_m_itt <- pop_itt %>% setdiff(68) #Patient 68 randomized but never received treatment # PP (Per-Protocol): Patients who completed treatment without major protocol deviations pop_pp <- pop_m_itt %>% setdiff(c(33, 79)) #Major deviations # Safety: All patients who received at least one dose of treatment pop_safety <- pop_itt %>% setdiff(68) #Same as mITT # Evaluable: Patients who completed required assessments for primary endpoint pop_evaluable <- pop_itt %>% setdiff(c(44, 91)) #No primary endpoint assessment l = list( "Total population"=pop_total, "ITT population"=pop_itt, "mITT population"=pop_m_itt, "PP population"=pop_pp, "Safety population"=pop_safety, "Evaluable population"=pop_evaluable ) edc_population_plot(l[-1], ref=pop_total) ```