---
title: "Interactive exploration of design matrices with ExploreModelMatrix"
author: "Charlotte Soneson, Federico Marini, Michael I Love, Florian Geier and Michael B Stadler"
date: "`r Sys.Date()`"
output:
html_vignette
vignette: >
%\VignetteIndexEntry{ExploreModelMatrix}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
stopifnot(requireNamespace("htmltools"))
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
```
# Introduction
`ExploreModelMatrix` is an R package for visualizing design matrices generated
by the `model.matrix()` R function. Provided with a sample information table
and a design formula, the `ExploreModelMatrix()` function launches a shiny
app where the user can explore the fitted values (in terms of the model
coefficients) for each combination of predictor values. In addition, the app
allows the user to interactively change the design formula and the reference
levels of factor variables as well as drop unwanted columns from the design
matrix, in order to explore the effect on the composition of the fitted values.
Note that `ExploreModelMatrix` is not intended to be used to determine _which_
design formula that should be used for analyzing a data set. Instead, its
purpose is to assist in the interpretation of the coefficients in a given
model.
In addition to the interactive visualization, `ExploreModelMatrix` also
provides a function, `VisualizeDesign()`, for generating static
visualizations.
In this vignette, we illustrate how the package can be used by showing
examples of applying the functions to various experimental design setups.
Many examples are taken from questions raised at the
[Bioconductor support site](https://support.bioconductor.org/).
```{r setup}
library(ExploreModelMatrix)
```
# Interface
The `ExploreModelMatrix()` function opens a graphical interface where the user
can interactively explore the provided design. This section gives an overview
of what is shown in the graphical interface. A step-by-step tour is also
available by clicking on the icon in
the top right of the application.
The sidebar contains the input controls. The design formula of interest is
typed into the `Design formula` text box, and must start with the `~` symbol.
It can be changed interactively while using the application. If the
`ExploreModelMatrix()` function is called with `sampleData=NULL`, there will
also be an input control allowing a tab-delimited text file with sample
information to be uploaded to the application. Finally, the package contains a
collection of example designs, suitable for teaching, exploration and
illustration. The remaining input controls allow the user to change the
reference levels of the factor variables, to drop specific columns from the
design matrix, and to change the display settings of the plots.
The first row of the main body of the application displays the fitted values
(expressed in terms of the model coefficients) for each combination of
predictor values, in both figure and table form. In the next row, the full
provided sample table as well as a summary are provided, and the third row
displays the full design matrix as well as its rank. Panels below this
display the pseudoinverse of the design matrix, a visualization of variance
inflation factors, a co-occurrence matrix and the correlation among the
model coefficients.
# Examples
This section contains a number of examples of real designs, and shows how they
can be explored with `ExploreModelMatrix`. For each example, the sample
information table is printed out. Next, the `VisualizeDesign()` function is
called to generate a static plot of the fitted values, in terms of the model
coefficients. This is the same plot that is displayed in the top left panel
of the interactive interface generated by `ExploreModelMatrix()`. We also
provide the code for generating and (for interactive sessions) opening
the interactive application with `ExploreModelMatrix()`.
## Example 1
This example illustrates a two-factor design (genotype and treatment), where the
effect of the genotype and treatment are assumed to be additive. For each
genotype, two treated and two control individuals are studied. The design
formula is `~ genotype + treatment`, reflecting the assumption of additivity
between the two predictors. The figure generated by the `VisualizeDesign()`
function, displayed below, shows the value of the linear predictor (or, for a
regular linear model, the fitted values) for observations with a given
combination of predictor values, in terms of the model coefficients. This can be
useful in order to set up suitable contrasts. For example, we can see that
testing the null hypothesis that the `genotypeB` coefficient is zero would
correspond to comparing observations with genotype B and those with genotype A.
```{r, fig.width = 5}
(sampleData <- data.frame(genotype = rep(c("A", "B"), each = 4),
treatment = rep(c("ctrl", "trt"), 4)))
vd <- VisualizeDesign(sampleData = sampleData,
designFormula = ~ genotype + treatment,
textSizeFitted = 4)
cowplot::plot_grid(plotlist = vd$plotlist)
app <- ExploreModelMatrix(sampleData = sampleData,
designFormula = ~ genotype + treatment)
if (interactive()) shiny::runApp(app)
```
## Example 2
From https://support.bioconductor.org/p/121132/. In this example we are
considering a set of patients, each being either Resistant or Sensitive to a
treatment, and each studied before (pre) and after (post) treatment. Patients
have been renumbered within each response group, and patients with only pre- or
post-measurements are removed. We use the design
`~ Response + Response:ind.n + Response:Treatment`.
As can be seen from the visualization below, this lets us easily compare e.g.
post- vs pre-treatment observations within the Sensitive group (via the
`ResponseSensitive.Treatmentpre` coefficient).
```{r, fig.width = 5, fig.height = 12}
(sampleData <- data.frame(
Response = rep(c("Resistant", "Sensitive"), c(12, 18)),
Patient = factor(rep(c(1:6, 8, 11:18), each = 2)),
Treatment = factor(rep(c("pre","post"), 15)),
ind.n = factor(rep(c(1:6, 2, 5:12), each = 2))))
vd <- VisualizeDesign(
sampleData = sampleData,
designFormula = ~ Response + Response:ind.n + Response:Treatment,
textSizeFitted = 3
)
cowplot::plot_grid(plotlist = vd$plotlist, ncol = 1)
app <- ExploreModelMatrix(
sampleData = sampleData,
designFormula = ~ Response + Response:ind.n + Response:Treatment
)
if (interactive()) shiny::runApp(app)
```
The design above doesn't allow comparison between Resistant and Sensitive
patients while accounting for the patient effect, since the patient is nested
within the response group. If we choose to ignore the patient effect, we can fit
a factorial model with the design formula `~ Treatment + Response`, as
illustrated below.
```{r, fig.width = 5}
vd <- VisualizeDesign(sampleData = sampleData,
designFormula = ~ Treatment + Response,
textSizeFitted = 4)
cowplot::plot_grid(plotlist = vd$plotlist, ncol = 1)
```
## Example 3
From https://support.bioconductor.org/p/80408/. Here we are considering mice
from two conditions (ctrl/ko), each measured with and without treatment with a
drug (plus/minus). We use the design `~ 0 + batch + condition` (where `batch`
corresponds to the mouse ID), and drop the column corresponding to
`conditionko_minus` to get a full-rank design matrix.
```{r, fig.height = 4, fig.width = 6}
(sampleData = data.frame(
condition = factor(rep(c("ctrl_minus", "ctrl_plus",
"ko_minus", "ko_plus"), 3)),
batch = factor(rep(1:6, each = 2))))
vd <- VisualizeDesign(sampleData = sampleData,
designFormula = ~ 0 + batch + condition,
textSizeFitted = 4, lineWidthFitted = 20,
dropCols = "conditionko_minus")
cowplot::plot_grid(plotlist = vd$plotlist, ncol = 1)
app <- ExploreModelMatrix(sampleData = sampleData,
designFormula = ~ batch + condition)
if (interactive()) shiny::runApp(app)
```
# Session info
```{r}
sessionInfo()
```