--- title: "summaryTable" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{summaryTable} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = TRUE, eval = TRUE, warning=FALSE, fig.height = 6, fig.width = 9, fig.align='center' ) ``` ```{r color-function, echo = FALSE} colorize <- function(text, color) { if (knitr::is_latex_output()) { sprintf("\\textcolor{%s}{%s}", color, text) } else if (knitr::is_html_output()) { sprintf("%s", color, text) } else text } ``` ```{r setup, echo = FALSE, message = FALSE, warning = FALSE} library(dplyr) library(tidyverse) library(gtsummary) library(summarySCI) library(flextable) ``` The function `summaryTable()` produces a table with descriptive statistics for continuous, categorical and dichotomous variables. It is based on the function `gtsummary::tbl_summary()`, with several enhancements and simplifications, such as - Simplified syntax for easier and more intuitive use. - Display of missing values for categorical variables: Option to show (or not) the percentage of missing values next to the count. - Columns with the number of non-missing observations can be added for each group ## Setup and data To demonstrate the various functionalities of the function we will use the dataset `survival::colon`. ```{r, message = FALSE} library(survival) data(cancer, package="survival") colon1 <- colon %>% group_by(id) %>% slice(1) %>% # Select the first row within each id group ungroup() ``` ```{r, echo = FALSE} n_patients <- nrow(colon) ``` The dataset `colon` contains data of `r n_patients` patients from one of the first successful trials of adjuvant chemotherapy for colon cancer. For simplicity, we focus here on recurrence only, two treatment groups, and four variable: - the treatment group (`rx`), - the sex (`Male`), - the age (`age`) and - the extent of local spread (`extent`). We also add a few missing values for the variable `extent`. ```{r} set.seed(123) colon2 <- colon1 %>% select(rx, sex, age, extent) %>% filter(rx != "Lev") %>% mutate(rx = if_else(rx == "Obs", "Control", rx), extent = if_else(row_number() %in% sample(row_number(), size = round(0.1 * n())), NA, extent)) %>% rename(Male = sex) %>% mutate(extent = as.factor(extent)) ``` ```{r} head(colon2) ``` ## Simple table By default, the function produces a table with all variables present in the dataset. ```{r} summaryTable(data = colon2) ``` If only specific variables are to be included, they need to be entered in the argument `vars`. The argument `group` allows the summary statistics to be stratified by this variable. ```{r} summaryTable(data = colon2, vars = c("Male", "age", "extent"), group = "rx") ``` ### Displayed name of variables The displayed name of each variable is - the label if it exists in the dataset, or - the variable name if no label is present in the dataset (which is the case in our example). In order to customize the displayed name, the argument `labels` can be used. Please note that the labels need to be entered as a list, as shown below: ```{r} summaryTable(data = colon2, group = "rx", labels = list(age = "Age", extent = "Extent")) ``` ## Adding number of observations The number of observations **which are not missing values** are by default added in a new column. This can be disabled by setting the argument `add_n` to `FALSE`. ```{r} summaryTable(data = colon2, group = "rx", labels = list(rx = "Arm", age = "Age", extent = "Extent"), add_n = FALSE) ``` ## Overall column An "overall" column can be added by setting the argument `overall` to `TRUE`. ```{r} summaryTable(data = colon2, group = "rx", overall = TRUE, labels = list(age = "Age", extent = "Extent")) ``` ## Variable types The function `gtsummary::tbl_summary` considers numeric variables with fewer than 10 unique values as categorical by default. This is not the case in the function `summaryTable`. Per default, all numeric variables are considered as continuous, unless they only have two unique values: 0 and 1. In that case, they are considered as dichotomous. This can be changed by setting the argument `continuous_as` to `categorical`. For dichotomous variables, all levels are displayed by default. To show only one row, use the argument `dichotomous_as = dichotomous`. The reference level is specified using the argument `value = list(variable ~ "level to show")`. ```{r} summaryTable(data = colon2, group = "rx", vars = "Male", labels = list(age = "Age"), dichotomous_as = "dichotomous", value = list(Male ~ "1"), missing = FALSE) ``` By default, the function plots the median and range for continuous variables. A number of other options are available, using the argument `stat_cont`. ### Statistic type The statistics to be displayed can be chosen using the argument `stat_cont` (options: `median_IQR`, `median_range` (default), `"mean_sd"`, `"mean_se"` and `"geomMean_sd"`) and `stat_cat` (options: `"n_percent"` (default) `"n"` and `"n_N"`). ```{r} summaryTable(data = colon2, group = "rx", stat_cont = "median_IQR", stat_cat = "n_N", labels = list(age = "Age", sex = "Sex", extent = "Extent")) ``` ## Tests By default, no p-value and confidence (CI) are displayed. p-values can be added by setting `test` to `TRUE` and CI by setting `ci` to `TRUE`. The default test type for continuous variable is `wilcox.test`, and `fisher.test` for categorical variables. This can be changed in `test_cont` and `test_cat`, respectively. The default CI type for continuous variables is `wilcox.test` and `wilson` for categorical variables. This can be changed in `ci_cont` and `ci_cat`, respectively. ```{r} summaryTable(data = colon2, group = "rx", vars = c("age", "extent"), stat_cont = "mean_sd", test = TRUE, ci = TRUE, labels = list(age = "Age", extent = "Extent") ) ``` ## Missing values Per default, missing values are shown as a separate category. This can be disabled by setting `missing` to `FALSE`. For `missing = TRUE`, the percentage are automatically added next to the missing number. This can be disabled by setting the argument `missing_percentage` to `FALSE`. ```{r} summaryTable(data = colon2, group = "rx", vars = "extent", test = TRUE, ci = TRUE, missing_percent = FALSE, labels = list(extent = "Extent") ) summaryTable(data = colon2, group = "rx", vars = "extent", test = TRUE, ci = TRUE, missing_percent = TRUE, labels = list(extent = "Extent") ) ``` The tables with and without missing values can also be put next to each other by setting `missing` to `"both"`. ```{r} summaryTable(data = colon2, group = "rx", vars = "extent", missing_percent = "both", test = TRUE, labels = list(extent = "Extent") ) ``` ## Further customization Digits can be customized with the arguments `digits_cont` and `digits_cat`. The argument `as_flex_table` (default to `TRUE`) converts the gtsummary object to a flextable object, which is better for Word output. # Next steps The argument `type` will be introduced in a future release to enable more fine-grained customization of the variables types.