---
title: "Exploratory Analysis"
author: "Jana Stoilova, Adrian Waddell and Gabriel Becker"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Exploratory Analysis}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

```{r, include = FALSE}
suggested_dependent_pkgs <- c("dplyr")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = all(vapply(
    suggested_dependent_pkgs,
    requireNamespace,
    logical(1),
    quietly = TRUE
  ))
)
```

```{r, echo=FALSE}
knitr::opts_chunk$set(comment = "#")
```

```{css, echo=FALSE}
.reveal .r code {
    white-space: pre;
}
```

## Introduction

In this vignette, we would like to introduce how `qtable()` can be used
to easily create cross tabulations for exploratory data analysis.
`qtable()` is an extension of `table()` from base R and can do much
beyond creating two-way contingency tables. The function has a simple to
use interface while internally it builds layouts using the `rtables`
framework.

## Getting Started

Load packages used in this vignette:

```{r, message=FALSE}
library(rtables)
library(dplyr)
```

Let's start by seeing what `table()` can do:

```{r}
table(ex_adsl$ARM)
table(ex_adsl$SEX, ex_adsl$ARM)
```

We can easily recreate the cross-tables above with `qtable()` by
specifying a data.frame with variable(s) to tabulate. The `col_vars` and
`row_vars` arguments control how to split the data across columns and rows
respectively.

```{r}
qtable(ex_adsl, col_vars = "ARM")
qtable(ex_adsl, col_vars = "ARM", row_vars = "SEX")
```

Aside from the display style, the main difference is that `qtable()`
will add (N=xx) in the table header by default. This can be removed with
`show_colcounts`.

```{r}
qtable(ex_adsl, "ARM", show_colcounts = FALSE)
```

Any variables used as the row or column facets should not have any empty
strings (""). This is because non empty values are required as labels when
generating the table. The code below will generate an error.

```{r, eval = FALSE}
tmp_adsl <- ex_adsl
tmp_adsl$new <- rep_len(c("", "A", "B"), nrow(tmp_adsl))

qtable(tmp_adsl, row_vars = "new")
```

## Nested Tables

Providing more than one variable name for the row or column structure in
`qtable()` will create a nested table. Arbitrary nesting is supported in
each dimension.

```{r}
qtable(ex_adsl, row_vars = c("SEX", "STRATA1"), col_vars = c("ARM", "STRATA2"))
```

Note that by default, unobserved factor levels within a facet are not
included in the table. This can be modified with `drop_levels`. The code
below adds a row of 0s for `STRATA1` level "B" nested under the `SEX`
level "UNDIFFERENTIATED".

```{r}
qtable(
  ex_adsl,
  row_vars = c("SEX", "STRATA1"),
  col_vars = c("ARM", "STRATA2"),
  drop_levels = FALSE
)
```

In contrast, `table()` cannot return a nested table. Rather it produces
a list of contingency tables when more than two variables are used as
inputs.

```{r}
table(ex_adsl$SEX, ex_adsl$STRATA1, ex_adsl$ARM, ex_adsl$STRATA2)
```

With some help from `stats::ftable()` the nested structure can be
achieved in two steps.

```{r}
t1 <- ftable(ex_adsl[, c("SEX", "STRATA1", "ARM", "STRATA2")])
ftable(t1, row.vars = c("SEX", "STRATA1"))
```

## NA Values

So far in all the examples we have seen, we used counts to summarize the data
in each table cell as this is the default analysis used by
`qtable()`. Internally, a single analysis variable specified by `avar` is used
to generate the counts in the table. The default analysis variable is the first
variable in `data`. In the case of `ex_adsl` this is "STUDYID".

Let's see what happens when we introduce some `NA` values into the
analysis variable:

```{r}
tmp_adsl <- ex_adsl
tmp_adsl[[1]] <- NA_character_

qtable(tmp_adsl, row_vars = "ARM", col_vars = "SEX")
```

The resulting table is showing 0's across all cells because all the values of
the analysis variable are `NA`.

Keep this behavior in mind when doing quick exploratory analysis using the
default counts aggregate function of `qtable`.

If this does not suit your purpose, you can either pre-process your data to
re-code the `NA` values or use another analysis function. We will see how
the latter is done in the [Custom Aggregation] section.

```{r}
# Recode NA values
tmp_adsl[[1]] <- addNA(tmp_adsl[[1]])

qtable(tmp_adsl, row_vars = "ARM", col_vars = "SEX")
```

In addition, row and column variables should have `NA` levels explicitly
labelled as above. If this is not done, the columns and/or rows will not reflect
the full data.
```{r}
tmp_adsl$new1 <- factor(NA_character_, levels = c("X", "Y", "Z"))
qtable(tmp_adsl, row_vars = "ARM", col_vars = "new1")
```

Explicitly labeling the `NA` levels in the column facet adds a column to the
table:
```{r}
tmp_adsl$new2 <- addNA(tmp_adsl$new1)
levels(tmp_adsl$new2)[4] <- "<NA>" # NA needs to be a recognizible string
qtable(tmp_adsl, row_vars = "ARM", col_vars = "new2")
```

## Custom Aggregation

A powerful feature of `qtable()` is that the user can define the type of
function used to summarize the data in each facet. We can specify the
type of analysis summary using the `afun` argument:

```{r}
qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = mean)
```

Note that the analysis variable `AGE` and analysis function name are
included in the top right header of the table.

If the analysis function returns a vector of 2 or 3 elements, the result
is displayed in multi-valued single cells.

```{r}
qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = range)
```

If you want to use an analysis function with more than 3 summary
elements, you can use a list. In this case, the values are displayed in
the table as multiple stacked cells within each facet. If the list
elements are named, the names are used as row labels.

```{r}
fivenum2 <- function(x) {
  setNames(as.list(fivenum(x)), c("min", "Q1", "MED", "Q3", "max"))
}
qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = fivenum2)
```

More advanced formatting can be controlled with `in_rows()`. See
function documentation for more details.

```{r}
meansd_range <- function(x) {
  in_rows(
    "Mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
    "Range" = rcell(range(x), format = "xx - xx")
  )
}

qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = meansd_range)
```

## Marginal Summaries

Another feature of `qtable()` is the ability to quickly add marginal
summary rows with the `summarize_groups` argument. This summary will add
to the table the count of non-NA records of the analysis variable at
each level of nesting. For example, compare these two tables:

```{r}
qtable(
  ex_adsl,
  row_vars = c("STRATA1", "STRATA2"), col_vars = "ARM",
  avar = "AGE", afun = mean
)

qtable(
  ex_adsl,
  row_vars = c("STRATA1", "STRATA2"), col_vars = "ARM",
  summarize_groups = TRUE, avar = "AGE", afun = mean
)
```

In the second table, there are marginal summary rows for each level of the
two row facet variables: `STRATA1` and `STRATA2`. The number 18 in the second
row gives the count of observations part of `ARM` level "A: Drug X",
`STRATA1` level "A", and `STRATA2` level "S1". The percent is calculated as
the cell count divided by the column count given in the table header. So we can
see that the mean `AGE` of 31.61 in that subgroup is based on 18 subjects
which correspond to 13.4% of the subjects in arm "A: Drug X".

See `?summarize_row_groups` for how to add marginal summary rows when
using the core `rtables` framework.

## Table Decorations

Tables generated with `qtable()` can include annotations such as titles,
subtitles and footnotes like so:

```{r}
qtable(
  ex_adsl,
  row_vars = "STRATA2", col_vars = "ARM",
  title = "Strata 2 Summary",
  subtitle = paste0("STUDY ", ex_adsl$STUDYID[1]),
  main_footer = paste0("Date: ", as.character(Sys.Date()))
)
```

## Summary

Here is what we have learned in this vignette:

-   `qtable()` can replace and extend uses of `table()` and `stats::ftable()`

-   `qtable()` is useful for exploratory data analysis

As the intended use of `qtable()` is for exploratory data analysis,
there is limited functionality for building very complex tables. For
details on how to get started with the core `rtables` layout
functionality see the [`introduction`](https://insightsengineering.github.io/rtables/latest-release/articles/rtables.html)
vignette.