---
title: "Summarise temporal trends in OMOP tables"
output: 
  html_document:
    pandoc_args: [
      "--number-offset=1,0"
      ]
    number_sections: yes
    toc: yes
vignette: >
  %\VignetteIndexEntry{summarise_trend}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

In this vignette, we will explore the *OmopSketch* function `summariseTrend()`, which summarises temporal trends from OMOP CDM tables.
This function allows you to visualise how key measures (such as number of records, number of persons, person-days, age, or sex distribution) change over time.

## Create a mock cdm

Let's start by loading essential packages and creating a mock CDM using the R package [omock](https://ohdsi.github.io/omock/)

```{r, warning=FALSE}
library(omock)
library(OmopSketch)
library(dplyr)
library(visOmopResults)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")

cdm
```

# Summarise temporal trends

Let’s use `summariseTrend()` to get an overview of the content of the table over time.
In this example, we’ll summarise yearly trends for *condition_occurrence* and *drug_exposure* tables, and also include *observation_period* as an episode table.
```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = c("condition_occurrence", "drug_exposure"),
  episode = "observation_period",
  interval = "years",
)

summarisedResult |>
  glimpse()
```

Notice that the output is in the [summarised result](https://darwin-eu.github.io/omopgenerics/articles/summarised_result.html) format.

## What are Event and Episode tables?

- **Event** tables capture occurrences that happen at a single point in time (for example, a diagnosis, a prescription, or a measurement).
For these tables, each record is linked to a time interval based only on its **start date**.

- **Episode** describe periods that span over time (for example, observation periods or treatment eras).
Each record contributes to every time interval between its start and end dates, reflecting its entire duration within the study period.

You can check whether a table was treated as an event or an episode table in the settings of the summarised result:

```{r}
summarisedResult |>
  addSettings(settingsColumn = "type") |>
  glimpse()
```

## Outputs

You can choose what to summarise using the `output` argument.
Options include:

- "record": Number of records (default value)

- "person": Number of distinct persons

- "person-days": Number of person-days (episode tables only)

- "age": Median age at start date of each interval

- "sex": Number of females

### Records and subjects per year

For each time interval the results will include the number of records and number of individuals observed during that period.
In addition to absolute counts, the function also reports the percentage of records and individuals within each interval relative to the total counts in the entire table.

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  output = c("record", "person"),
  interval = "years"
)

summarisedResult |>
  select(group_level, variable_name, additional_level, estimate_name, estimate_value)
```

### Person-days

When an episode table is specified, you can include "person-days" in the output to summarise total follow-up time across intervals. The results will show both the number of person-days in each interval and the percentage of person-days relative to the total accumulated across the entire table.

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  episode = "observation_period",
  output = "person-days",
  interval = "years"
)

summarisedResult |>
  select(group_level, variable_name, additional_level, estimate_name, estimate_value)
```

Note: The function will automatically skip "person-days" for event tables. 

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "visit_occurrence",
  output = "person-days",
  interval = "years"
)
summarisedResult
```

### Age

When "age" is included in the output argument, the function reports the median age of individuals for each time interval.
For every record, age is calculated either at the start of the time interval or at the record’s start date, whichever comes first. This allows you to examine how the age distribution of individuals evolves over time for a given event or episode table.

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  output = "age",
  interval = "years"
)

summarisedResult |>
  select(variable_name, additional_level, estimate_name, estimate_value)
```

### Sex output

When "sex" is included in the output argument, the function counts the number of females in each time interval.
It also provides the percentage of females relative to the total number of individuals in the entire table. This output is particularly useful for exploring changes in the sex distribution of records over time.


```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  output = "sex",
  interval = "years"
)
summarisedResult |>
  select(variable_name, additional_level, estimate_name, estimate_value)
```

## Intervals

The argument `interval`` controls the temporal granularity of the results.
Possible values are "overall" (default, no stratification by time), "years", "quarters", and "months".

For example, to see quarterly trends:
```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  interval = "quarters",
  output = "record"
)

summarisedResult |>
  select(additional_level, estimate_value)
```

## Stratify by age and sex

You can use the arguments `ageGroup` and `sex` to stratify the results.

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  interval = "years",
  output = c("record", "age", "sex"),
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
  sex = TRUE
)

summarisedResult |>
  select(variable_name, strata_level, estimate_name, estimate_value)
```

By default, the output includes the "overall" group as well as combined strata (e.g., Female and >=35).
Note that for `output = "sex"`, sex stratification is not applied because a single estimate summarising the female population is returned.

## In-observation stratification

When `inObservation = TRUE`, the results will indicate whether each record occurred within the subject’s observation period.
This can be useful for identifying data quality issues or assessing completeness.

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  interval = "overall",
  output = "record",
  inObservation = TRUE
)

summarisedResult |>
  select(variable_name, strata_name, strata_level, estimate_name, estimate_value)
```

## Date Range

You can restrict the study period using the `dateRange` argument.

```{r}
summarisedResult <- summariseTrend(
  cdm = cdm,
  event = "drug_exposure",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)

summarisedResult |>
  settings() |>
  glimpse()
```

# Tidy the summarised object with tableTrend

`tableTrend()` helps you convert a summarised result into a nicely formatted table for reporting or inspection (for example [gt](https://gt.rstudio.com/) (default), [flextable](https://www.rdocumentation.org/packages/flextable), [reactable](https://glin.github.io/reactable/), or [DT::datatable](https://rstudio.github.io/DT/)). It formats time intervals, strata and estimate columns so the results are easy to read and export.

```{r}
result <- summariseTrend(
  cdm = cdm,
  event = "condition_occurrence",
  episode = "drug_exposure",
  output = "age",
  interval = "years"
)
tableTrend(result = result)
```

# Visualise trends with plotTrend

`plotTrend()` builds a ggplot2 visualisation from a summarised result. 
```{r}
result <- summariseTrend(
  cdm = cdm,
  event = "measurement", 
  interval = "quarters",
  sex = TRUE, 
  ageGroup = list(c(0, 17), c(18, Inf)),
  dateRange = as.Date(c("2010-01-01", "2019-12-31"))
)

plotTrend(
  result = result,
  colour = "sex",
  facet = "age_group"
)
```

When the result includes several outputs (for example, records, persons, or person-days), the function defaults to plotting the number of records.
You can override this by setting the output argument to the measure you want to visualise.

```{r}
result <- summariseTrend(cdm,
  event = "measurement",
  interval = "quarters",
  output = c("sex", "record"),
  dateRange = as.Date(c("2010-01-01", "2019-12-31"))
)
plotTrend(
  result = result,
  output = "sex"
)
```

You can also specify facet (formula or column) and colour. Valid column names are the tidied result columns (see visOmopResults::tidyColumns())

```{r}
result <- summariseTrend(cdm,
  event = "measurement",
  interval = "quarters",
  sex = TRUE,
  inObservation = TRUE,
  dateRange = as.Date(c("2010-01-01", "2019-12-31"))
)
plotTrend(
  result = result,
  facet = omop_table ~ sex,
  colour = "in_observation"
)
```

# Disconnect from CDM

Finally, disconnect from the mock CDM.

```{r}
cdmDisconnect(cdm = cdm)
```